The similarity self-join finds all objects in a dataset that are within a search distance, ∈, of each other. As such, the self-join is a building block of many algorithms. In high dimensions, indexing structures become increasingly ineffective at pruning the search, making the self-join challenging to compute efficiently. We advance a GPU-accelerated self-join algorithm targeted towards high dimensional data. The massive parallelism afforded by the GPU and high aggregate memory bandwidth makes the architecture well-suited for data-intensive workloads. We leverage a grid-based GPU-tailored index to perform range queries, and propose the following optimizations: (i) a trade-off between candidate set filtering and index search overhead by exploiting properties of the index; (ii) reordering the data based on variance in each dimension to improve the filtering power of the index; and (iii) a pruning method for reducing the number of expensive distance calculations. Our algorithm generally outperforms a parallel CPU state-of-the-art approach.
CITATION STYLE
Gowanlock, M., & Karsin, B. (2019). GPU-accelerated similarity self-join for multi-dimensional data. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Association for Computing Machinery. https://doi.org/10.1145/3329785.3329920
Mendeley helps you to discover research relevant for your work.