Parallel Implementation of Statistical DBSCAN Algorithm for Spark-based Clustering on Google Cloud Platform

1Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

We present a new parallel density-based spatial clustering of applications with noise (DBSCAN) algorithm for spark on the google cloud platform (GCP). Statistical analysis is applied to determine DBSCAN's optimal parameters to enhance clustering performance. for scalability cost-based, R-tree partitioning is selected based on the distribution of the dataset into balanced workloads. Parallel DBSCAN consists of three parts: local DBSCAN, partitioning, and merging. Optimizing the partitioning of parallel DBSCAN is important to save time and space compared to serial DBSCAN. This approach can improve the performance and time cost of large datasets. the modified statistical cost-based (SCbs-DBSCAN) is applied to the UCI (university of california irvine) standard datasets, basic benchmark clustering and large different scales of data. For clustering performance and time cost, the experimental results show that the proposed algorithm achieve 10~15% more efficiently, and can run about 1.5x~3x faster than alternative Parallel DBSCAN method on Spark without sacrificing clustering quality

References Powered by Scopus

Data clustering: A review

10856Citations
N/AReaders
Get full text

Data Mining: Concepts and Techniques

5288Citations
N/AReaders
Get full text

Internet of things in industries: A survey

4215Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Vectorized Highly Parallel Density-based Clustering for Applications with Noise

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Awaad, A. M., & Hefny, H. (2023). Parallel Implementation of Statistical DBSCAN Algorithm for Spark-based Clustering on Google Cloud Platform. International Journal of Intelligent Engineering and Systems, 16(2), 279–290. https://doi.org/10.22266/ijies2023.0430.23

Readers over time

‘23‘24‘2502468

Readers' Seniority

Tooltip

Professor / Associate Prof. 1

50%

PhD / Post grad / Masters / Doc 1

50%

Readers' Discipline

Tooltip

Business, Management and Accounting 1

50%

Computer Science 1

50%

Save time finding and organizing research with Mendeley

Sign up for free
0