FastThetaJoin: An Optimization on Multi-way Data Stream θ -join with Range Constraints

0Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we propose FastThetaJoin, an optimization technique for θ-join operation on multi-way data streams, which is an essential query often used in many data analytical tasks. The θ-join operation on multi-way data streams is notoriously difficult as it always involves tremendous shuffle cost due to data movements between multiple operation components, rendering it hard to be efficiently implemented in a distributed environment. As with previous methods, FastThetaJoin also tries to minimize the number of θ-joins, but it is distinct from others in terms of making partitions, deleting unnecessary data items, and performing the Cartesian product. FastThetaJoin not only effectively minimizes the number of θ-joins, but also substantially improves the efficiency of its operations in a distributed environment. We implemented FastThetaJoin in the framework of Spark Streaming, characterized by its efficient bucket implementation of parameterized windows. The experimental results show that, compared with the existing solutions, our proposed method can speed up the θ-join processing while reducing its overhead; the specific effects of the optimization is correlated to the nature of data streams–the greater the data difference is, the more apparent the optimization effect is.

Cite

CITATION STYLE

APA

Hu, Z., Fan, X., Wang, Y., & Xu, C. (2020). FastThetaJoin: An Optimization on Multi-way Data Stream θ -join with Range Constraints. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12452 LNCS, pp. 174–189). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-60245-1_12

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free