Clustering by similarity in an auxiliary space

Janne Sinkkonen; Samuel Kaski

Conference Proceedings

Clustering by similarity in an auxiliary space

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2000) 1983 3-8

DOI: 10.1007/3-540-44491-2_1

3Citations

4Readers

Get full text

Abstract

We present a clustering method for continuous data. It defines local clusters into the (primary) data space but derives its similarity measure from the posterior distributions of additional discrete data that occur as pairs with the primary data. As a case study, enterprises are clustered by deriving the similarity measure from bankruptcy sensitivity. In another case study, a content-based clustering for text documents is found by measuring differences between their metadata (keyword distributions). We show that minimizing our Kullback–Leibler divergence- based distortion measure within the categories is equivalent to maximizing the mutual information between the categories and the distributions in the auxiliary space. A simple on-line algorithm for minimizing the distortion is introduced for Gaussian basis functions and their analogs on a hypersphere.

Cite

CITATION STYLE

APA

Sinkkonen, J., & Kaski, S. (2000). Clustering by similarity in an auxiliary space. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1983, pp. 3–8). Springer Verlag. https://doi.org/10.1007/3-540-44491-2_1

Clustering by similarity in an auxiliary space

Abstract

Cite

Register to see more suggestions