Clustering large datasets using data stream clustering techniques

Matthew Bolaños; John Forrest; Michael Hahsler

Conference Proceedings

Clustering large datasets using data stream clustering techniques

Studies in Classification, Data Analysis, and Knowledge Organization (2014) 47 135-143

DOI: 10.1007/978-3-319-01595-8_15

8Citations

24Readers

Get full text

Abstract

Unsupervised identification of groups in large data sets is important for many machine learning and knowledge discovery applications. Conventional clustering approaches (k-means, hierarchical clustering, etc.) typically do not scale well for very large data sets. In recent years, data stream clustering algorithms have been proposed which can deal efficiently with potentially unbounded streams of data. This paper is the first to investigate the use of data stream clustering algorithms as light-weight alternatives to conventional algorithms on large non-streaming data. We will discuss important issue including order dependence and report the results of an initial study using several synthetic and real-world data sets.

Cite

CITATION STYLE

APA

Bolaños, M., Forrest, J., & Hahsler, M. (2014). Clustering large datasets using data stream clustering techniques. In Studies in Classification, Data Analysis, and Knowledge Organization (Vol. 47, pp. 135–143). Kluwer Academic Publishers. https://doi.org/10.1007/978-3-319-01595-8_15

Clustering large datasets using data stream clustering techniques

Abstract

Cite

Register to see more suggestions