Understanding information theoretic measures for comparing clusterings

Hanneke van der Hoef; Matthijs J. Warrens

Journal ArticleOPEN ACCESS

Understanding information theoretic measures for comparing clusterings

Behaviormetrika (2019) 46(2) 353-370

DOI: 10.1007/s41237-018-0075-7

30Citations

26Readers

Abstract

Many external validity indices for comparing different clusterings of the same set of objects are overall measures: they quantify similarity between clusterings for all clusters simultaneously. Because a single number only provides a general notion of what is going on, the values of such overall indices (usually between 0 and 1) are often difficult to interpret. In this paper, we show that a class of normalizations of the mutual information can be decomposed into indices that contain information on the level of individual clusters. The decompositions (1) reveal that overall measures can be interpreted as summary statistics of information reflected in the individual clusters, (2) specify how these overall indices are related to individual clusters, and (3) show that the overall indices are affected by cluster size imbalance. We recommend to use measures for individual clusters since they provide much more detailed information than a single overall number.

Author supplied keywords

Cite

CITATION STYLE

APA

van der Hoef, H., & Warrens, M. J. (2019). Understanding information theoretic measures for comparing clusterings. Behaviormetrika, 46(2), 353–370. https://doi.org/10.1007/s41237-018-0075-7

Understanding information theoretic measures for comparing clusterings

Abstract

Author supplied keywords

Cite

Register to see more suggestions