Distributed Tree-Pattern Matching in Big Data Analytics Systems

0Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Big data analytics systems such as Apache Spark offer built-in support for nested data, which abounds, for instance, as JSON data available online. However, these systems typically have to transform the data to gain access to (deeply) nested data for further processing. This adds complexity to big data analytics pipelines and may result in an unnecessary runtime overhead. Therefore, this paper introduces tree-pattern matching as a first-class operator in big data analytics systems. It reduces the complexity of big data analytics pipelines and accelerates the pipeline processing by up to four times, compared to state-of-the-art pipelines for nested data. The novelty of our operator lies in the distributed and data-parallel processing supported by its underlying tree-pattern matching algorithm. Experiments validate that our operator, implemented in Spark, can improve pipeline complexity and runtime.

Cite

CITATION STYLE

APA

Diestelkämper, R., & Herschel, M. (2020). Distributed Tree-Pattern Matching in Big Data Analytics Systems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12245 LNCS, pp. 171–186). Springer. https://doi.org/10.1007/978-3-030-54832-2_14

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free