In Big Data, SQL-on-Hadoop tools usually provide satisfactory performance for processing vast amounts of data, although new emerging tools may be an alternative. This paper evaluates if Apache Druid, an innovative column-oriented data store suited for online analytical processing workloads, is an alternative to some of the well-known SQL-on-Hadoop technologies and its potential in this role. In this evaluation, Druid, Hive and Presto are benchmarked with increasing data volumes. The results point Druid as a strong alternative, achieving better performance than Hive and Presto, and show the potential of integrating Hive and Druid, enhancing the potentialities of both tools.
CITATION STYLE
Correia, J., Costa, C., & Santos, M. Y. (2019). Challenging SQL-on-Hadoop Performance with Apache Druid. In Lecture Notes in Business Information Processing (Vol. 353, pp. 149–161). Springer Verlag. https://doi.org/10.1007/978-3-030-20485-3_12
Mendeley helps you to discover research relevant for your work.