Distributed Tree-Pattern Matching in Big Data Analytics Systems

Ralf Diestelkämper; Melanie Herschel

Conference Proceedings

Distributed Tree-Pattern Matching in Big Data Analytics Systems

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12245 LNCS 171-186

DOI: 10.1007/978-3-030-54832-2_14

0Citations

1Readers

Get full text

Abstract

Big data analytics systems such as Apache Spark offer built-in support for nested data, which abounds, for instance, as JSON data available online. However, these systems typically have to transform the data to gain access to (deeply) nested data for further processing. This adds complexity to big data analytics pipelines and may result in an unnecessary runtime overhead. Therefore, this paper introduces tree-pattern matching as a first-class operator in big data analytics systems. It reduces the complexity of big data analytics pipelines and accelerates the pipeline processing by up to four times, compared to state-of-the-art pipelines for nested data. The novelty of our operator lies in the distributed and data-parallel processing supported by its underlying tree-pattern matching algorithm. Experiments validate that our operator, implemented in Spark, can improve pipeline complexity and runtime.

Author supplied keywords

Cite

CITATION STYLE

APA

Diestelkämper, R., & Herschel, M. (2020). Distributed Tree-Pattern Matching in Big Data Analytics Systems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12245 LNCS, pp. 171–186). Springer. https://doi.org/10.1007/978-3-030-54832-2_14

Distributed Tree-Pattern Matching in Big Data Analytics Systems

Abstract

Author supplied keywords

Cite

Register to see more suggestions