Distributed Join Processing between Streaming and Stored Big Data under the Micro-Batch Model

Young Ho Jeon; Ki Hoon Lee; Ho Jun Kim

Journal ArticleOPEN ACCESS

Distributed Join Processing between Streaming and Stored Big Data under the Micro-Batch Model

IEEE Access (2019) 7 34583-34598

DOI: 10.1109/ACCESS.2019.2904730

15Citations

18Readers

Abstract

In order to interpret, enrich, and analyze the streaming data, stream applications often access the data stored in an external database. Although there has been a lot of studies on stream processing, little attention has been paid so far to the join between streaming data and stored data. In this paper, we propose a comprehensive solution called DS-join for distributed processing of the join under the micro-batch model of recently distributed stream processing engines (SPEs), such as spark streaming. The micro-batch model performs stream processing as a series of very small batch jobs and is more fault-tolerant in a distributed environment compared with the record-at-a-time model. The DS-join reduces the number of database accesses by using micro-batching. Furthermore, the DS-join optimizes the join operation by minimizing the data shuffling, managing a cache in a distributed SPE, parallelizing the join processing, and balancing the load between the SPE and the external database system. The experimental results using real and synthetic datasets show that, compared with the state-of-the-art methods, the DS-join significantly improves throughput, especially for large databases.

Author supplied keywords

Cite

CITATION STYLE

APA

Jeon, Y. H., Lee, K. H., & Kim, H. J. (2019). Distributed Join Processing between Streaming and Stored Big Data under the Micro-Batch Model. IEEE Access, 7, 34583–34598. https://doi.org/10.1109/ACCESS.2019.2904730

Distributed Join Processing between Streaming and Stored Big Data under the Micro-Batch Model

Abstract

Author supplied keywords

Cite

Register to see more suggestions