K-DBSCAN: An improved DBSCAN algorithm for big data

Nahid Gholizadeh; Hamid Saadatfar; Nooshin Hanafi

Journal ArticleOPEN ACCESS

K-DBSCAN: An improved DBSCAN algorithm for big data

Journal of Supercomputing (2021) 77(6) 6214-6235

DOI: 10.1007/s11227-020-03524-3

57Citations

89Readers

Abstract

Big data storage and processing are among the most important challenges now. Among data mining algorithms, DBSCAN is a common clustering method. One of the most important drawbacks of this algorithm is its low execution speed. This study aims to accelerate the DBSCAN execution speed so that the algorithm can respond to big datasets in an acceptable period of time. To overcome the problem, an initial grouping was applied to the data in this article through the K-means++ algorithm. DBSCAN was then employed to perform clustering in each group separately. As a result, the computational burden of DBSCAN execution reduced and the clustering execution speed increased significantly. Finally, border clusters were merged if necessary. According to the results of executing the proposed algorithm, it managed to greatly reduce the DBSCAN execution time (98% in the best-case scenario) with no significant changes in the qualitative evaluation criteria for clustering.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Gholizadeh, N., Saadatfar, H., & Hanafi, N. (2021). K-DBSCAN: An improved DBSCAN algorithm for big data. Journal of Supercomputing, 77(6), 6214–6235. https://doi.org/10.1007/s11227-020-03524-3

Readers' Seniority

PhD / Post grad / Masters / Doc 17

63%

Lecturer / Post doc 6

22%

Researcher 4

15%

Readers' Discipline

Computer Science 11

46%

Engineering 11

46%

Chemistry 1

Social Sciences 1

Article Metrics

Mentions

News Mentions: 1

View details >

K-DBSCAN: An improved DBSCAN algorithm for big data

Abstract

Author supplied keywords

References Powered by Scopus

Silhouettes: A graphical aid to the interpretation and validation of cluster analysis

Data clustering: A review

A Cluster Separation Measure

Cited by Powered by Scopus

A fast DBSCAN algorithm for big data based on efficient density calculation

An efficient DBSCAN optimized by arithmetic optimization algorithm with opposition-based learning

MDBSCAN: A multi-density DBSCAN based on relative density

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline

Article Metrics