Recently there has been a significant interest to perform realtime analytical queries in systems that can handle both “big data” and “fast data”. In this paper, we propose an approximate answering approach, called ROSE, which can manage the big and fast data streams and support complex analytical queries against the data streams. To achieve this goal, we start with an analysis of existing query processing techniques in big data systems to understand the requirements of building a distributed analytic sketch. We then propose a sampling-based sketch that can extract multi-faced samples from asynchronous data streams, and augment its usability with accuracy-lossless distributed sketch construction operations, such as splitting, merging and union. The experimental results with real-world data sets indicate that compared with state-of-the-art approximate answering engine BlinkDB, our techniques can obtain more accurate estimates and improve 2 times of system throughput. When compared with distributed memory-computing system Spark, our system can achieve 2 orders of magnitude improvement on query response time.
CITATION STYLE
Wu, G., Yun, X., Li, C., Wang, S., Wang, Y., Zhang, X., … Zhang, G. (2017). Supporting real-time analytic queries in big and fast data environments. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10178 LNCS, pp. 477–493). Springer Verlag. https://doi.org/10.1007/978-3-319-55699-4_29
Mendeley helps you to discover research relevant for your work.