Solving k-means on high-dimensional big data

Jan Philipp W. Kappmeier; Daniel R. Schmidt; Melanie Schmidt

Conference Proceedings

Solving k-means on high-dimensional big data

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9125 259-270

DOI: 10.1007/978-3-319-20086-6_20

1Citations

19Readers

Get full text

Abstract

In recent years, there have been major efforts to develop data stream algorithms that process inputs in one pass over the data with little memory requirement. For the k-means problem, this has led to the development of several (1 + ε)-approximations (under the assumption that k is a constant), but also to the design of algorithms that are extremely fast in practice and compute solutions of high accuracy. However, when not only the length of the stream is high but also the dimensionality of the input points, then current methods reach their limits. We propose two algorithms, piecy and piecy-mr that are based on the recently developed data stream algorithm BICO that can process high dimensional data in one pass and output a solution of high quality. While piecy is suited for high dimensional data with a medium number of points, piecy-mr is meant for high dimensional data that comes in a very long stream. We provide an extensive experimental study to evaluate piecy and piecy-mr that shows the strength of the new algorithms.

Author supplied keywords

Cite

CITATION STYLE

APA

Kappmeier, J. P. W., Schmidt, D. R., & Schmidt, M. (2015). Solving k-means on high-dimensional big data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9125, pp. 259–270). Springer Verlag. https://doi.org/10.1007/978-3-319-20086-6_20

Solving k-means on high-dimensional big data

Abstract

Author supplied keywords

Cite

Register to see more suggestions