RRPJ: Result-rate based progressive relational join

6Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Progressive join algorithms are join algorithms that produce results incrementally as input data is available. Because they are non-blocking, they are particularly suitable for online processing of data streams. Reference algorithms of this family are the symmetric hash join, the X-join and more recently, the rate-based progressive join (RPJ). While the symmetric hash join introduces the idea of a symmetric processing of the input streams but assumes sufficient main memory, the X-Join suggests that the processing can scale to very large amounts of data if main memory is regularly flushed to disk, and a reactive/cleanup phase is triggered for disk-resident data. The X-join flushing strategy is based on a simple largest-first strategy, where the largest partition is flushed to disk. The recently proposed RPJ predicts the main memory tuples or partitions that should be flushed to disk in order to maximize throughput by computing their probabilities to contribute to a result. In this paper, we discuss the limitations of RPJ and propose a novel extension, called Result Rate-based Progressive Join (RRPJ), which addresses these limitations. Instead of computing the probabilities from statistics over the input data, RRPJ directly observes the output (result) statistics. This not only yields a better performance, but also simplifies the generalization of the algorithm to non-relational data such as multidimensional data and hierarchical data. We empirically show that RRPJ is effective and efficient and outperforms the state-of-art RPJ. We also investigate the relevance and performance of an adaptive version of these algorithms using amortization parameters. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Tok, W. H., Bressan, S., & Lee, M. L. (2007). RRPJ: Result-rate based progressive relational join. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4443 LNCS, pp. 43–54). Springer Verlag. https://doi.org/10.1007/978-3-540-71703-4_6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free