An SNN-DBSCAN Based Clustering Algorithm for Big Data

2Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Clustering is a technique to partition data into different groups in such a way that data items in a group are more similar to each other than the data points in any other group. The assumption of infinite main memory is very usual while designing most of the clustering algorithms but this assumption fails when the size of data set starts increasing. In this scenario, data needs to be stored in the secondary memory and time spent in the input/outputs (I/O) dominates the actual computational time. Therefore by reducing the I/O, the efficiency of the clustering techniques can be improved. In this paper, one shared near neighbor based algorithm is devised by minimizing its I/O complexity to make it suitable for the Big Data in external memory model proposed by Aggarwal and Vitter. There is no change in the computational steps, hence cluster quality remains the same. We implement the algorithm in the STXXL library to show its efficacy for Big Data sets.

Cite

CITATION STYLE

APA

Pandey, S., Samal, M., & Mohanty, S. K. (2020). An SNN-DBSCAN Based Clustering Algorithm for Big Data. In Advances in Intelligent Systems and Computing (Vol. 1082, pp. 127–137). Springer. https://doi.org/10.1007/978-981-15-1081-6_11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free