Distributed Stream KNN Join

16Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

kNN join over data streams is an important operation for location-aware systems, which correlates events from different sources based on their occurrence locations. Combining the complexity of kNN join and the dynamicity of data streams, kNN join in streaming environments is a computationally intensive operator, and its performance can be greatly improved by utilizing the computational capabilities of modern non-uniform memory access (NUMA) computing platforms. However, the conventional approaches to kNN join for prestored datasets do not work efficiently with the kind of highly dynamic data found in streaming environments. Therefore, in this paper, we introduce an adaptive scalable stream kNN join, named ADS-kNN, to address the challenges of performing the kNN join operation on highly dynamic data. We propose a multistage kNN execution plan that enables high-performance kNN queries in distributed settings by overlapping the computation and communication stages. Moreover, we propose an adaptive data partitioning scheme that dynamically adjusts the load among the operators according to the changes in the input values. Combining these two techniques, ADS-kNN provides a scalable and adaptive kNN join operator for data streams. Our experiments using a 56-core system show that ADS-kNN achieves a maximum throughput that is 21 times higher than that of a single-threaded approach.

Cite

CITATION STYLE

APA

Shahvarani, A., & Jacobsen, H. A. (2021). Distributed Stream KNN Join. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 1597–1609). Association for Computing Machinery. https://doi.org/10.1145/3448016.3457269

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free