Hashing-based approximate DBSCAN

Tianrun Li; Thomas Heinis; Wayne Luk

Conference Proceedings

Hashing-based approximate DBSCAN

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9809 LNCS 31-45

DOI: 10.1007/978-3-319-44039-2_3

3Citations

4Readers

Get full text

Abstract

Analyzing massive amounts of data and extracting value from it has become key across different disciplines. As the amounts of data grow rapidly, however, current approaches for data analysis struggle. This is particularly true for clustering algorithms where distance calculations between pairs of points dominate overall time. Crucial to the data analysis and clustering process, however, is that it is rarely straightforward. Instead, parameters need to be determined through several iterations. Entirely accurate results are thus rarely needed and instead we can sacrifice precision of the final result to accelerate the computation. In this paper we develop ADvaNCE, a new approach to approximating DBSCAN. ADvaNCE uses two measures to reduce distance calculation overhead: (1) locality sensitive hashing to approximate and speed up distance calculations and (2) representative point selection to reduce the number of distance calculations. Our experiments show that our approach is in general one order of magnitude faster (at most 30x in our experiments) than the state of the art.

Cite

CITATION STYLE

APA

Li, T., Heinis, T., & Luk, W. (2016). Hashing-based approximate DBSCAN. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9809 LNCS, pp. 31–45). Springer Verlag. https://doi.org/10.1007/978-3-319-44039-2_3

Hashing-based approximate DBSCAN

Abstract

Cite

Register to see more suggestions