Big data multi-query optimisation with apache flink

9Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

Big data analytic frameworks, such as MapReduce, Spark and Flink, have recently gained more popularity to process large data. Flink is an open-source Apache-hosted big data analytic framework for processing batch and streaming data. For historical data processing (batch), Flink’s query optimiser is built based on techniques which have been used in the parallel database systems. Flink query optimiser translates the queries into jobs which are repeatedly submitted with similar tasks. Therefore, exploiting the similarity of tasks can avoid redundant computation. In this paper, Flink multi-query optimisation system, Flink-MQO, has been proposed and built on top of Flink software stack. It is considered as an add-on to Apache Flink to optimise multi-query based on data sharing. The Flink-MQO system exploits the data sharing opportunities of selection operators to eliminate the redundancy and duplication of data in-network movement of multi-query. Experimental results show that the exploiting of shared selection operators in big data multi-query can provide promising query execution time. Therefore, Flink-MQO system can potentially be used in the stream processing to improve the performance of the real-time applications.

Cite

CITATION STYLE

APA

Sahal, R., Khafagy, M. H., & Omara, F. A. (2018). Big data multi-query optimisation with apache flink. International Journal of Web Engineering and Technology, 13(1), 78–97. https://doi.org/10.1504/IJWET.2018.092401

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free