GPU-accelerated similarity self-join for multi-dimensional data

2Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

The similarity self-join finds all objects in a dataset that are within a search distance, ∈, of each other. As such, the self-join is a building block of many algorithms. In high dimensions, indexing structures become increasingly ineffective at pruning the search, making the self-join challenging to compute efficiently. We advance a GPU-accelerated self-join algorithm targeted towards high dimensional data. The massive parallelism afforded by the GPU and high aggregate memory bandwidth makes the architecture well-suited for data-intensive workloads. We leverage a grid-based GPU-tailored index to perform range queries, and propose the following optimizations: (i) a trade-off between candidate set filtering and index search overhead by exploiting properties of the index; (ii) reordering the data based on variance in each dimension to improve the filtering power of the index; and (iii) a pruning method for reducing the number of expensive distance calculations. Our algorithm generally outperforms a parallel CPU state-of-the-art approach.

Cite

CITATION STYLE

APA

Gowanlock, M., & Karsin, B. (2019). GPU-accelerated similarity self-join for multi-dimensional data. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Association for Computing Machinery. https://doi.org/10.1145/3329785.3329920

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free