An SNN-DBSCAN Based Clustering Algorithm for Big Data

Sriniwas Pandey; Mamata Samal; Sraban Kumar Mohanty

Conference Proceedings

An SNN-DBSCAN Based Clustering Algorithm for Big Data

Advances in Intelligent Systems and Computing (2020) 1082 127-137

DOI: 10.1007/978-981-15-1081-6_11

2Citations

2Readers

Get full text

Abstract

Clustering is a technique to partition data into different groups in such a way that data items in a group are more similar to each other than the data points in any other group. The assumption of infinite main memory is very usual while designing most of the clustering algorithms but this assumption fails when the size of data set starts increasing. In this scenario, data needs to be stored in the secondary memory and time spent in the input/outputs (I/O) dominates the actual computational time. Therefore by reducing the I/O, the efficiency of the clustering techniques can be improved. In this paper, one shared near neighbor based algorithm is devised by minimizing its I/O complexity to make it suitable for the Big Data in external memory model proposed by Aggarwal and Vitter. There is no change in the computational steps, hence cluster quality remains the same. We implement the algorithm in the STXXL library to show its efficacy for Big Data sets.

Author supplied keywords

Cite

CITATION STYLE

APA

Pandey, S., Samal, M., & Mohanty, S. K. (2020). An SNN-DBSCAN Based Clustering Algorithm for Big Data. In Advances in Intelligent Systems and Computing (Vol. 1082, pp. 127–137). Springer. https://doi.org/10.1007/978-981-15-1081-6_11

An SNN-DBSCAN Based Clustering Algorithm for Big Data

Abstract

Author supplied keywords

Cite

Register to see more suggestions