Clustering very large dissimilarity data sets

Barbara Hammer; Alexander Hasenfuss

Conference ProceedingsOPEN ACCESS

Clustering very large dissimilarity data sets

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 5998 LNAI 259-273

DOI: 10.1007/978-3-642-12159-3_24

4Citations

2Readers

Abstract

Clustering and visualization constitute key issues in computer-supported data inspection, and a variety of promising tools exist for such tasks such as the self-organizing map (SOM) and variations thereof. Real life data, however, pose severe problems to standard data inspection: on the one hand, data are often represented by complex non-vectorial objects and standard methods for finite dimensional vectors in Euclidean space cannot be applied. On the other hand, very large data sets have to be dealt with, such that data do neither fit into main memory, nor more than one pass over the data is still affordable, i.e. standard methods can simply not be applied due to the sheer amount of data. We present two recent extensions of topographic mappings: relational clustering, which can deal with general proximity data given by pairwise distances, and patch processing, which can process streaming data of arbitrary size in patches. Together, an efficient linear time data inspection method for general dissimilarity data structures results. We present the theoretical background as well as applications to the areas of text and multimedia processing based on the generalized compression distance. © 2010 Springer-Verlag.

Cite

CITATION STYLE

APA

Hammer, B., & Hasenfuss, A. (2010). Clustering very large dissimilarity data sets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5998 LNAI, pp. 259–273). https://doi.org/10.1007/978-3-642-12159-3_24

Clustering very large dissimilarity data sets

Abstract

Cite

Register to see more suggestions