Aura: A Flexible Dataflow Engine for Scalable Data Processing

Tobias Herb; Lauritz Thamsen; Thomas Renner; Odej Kao

Book Chapter

Aura: A Flexible Dataflow Engine for Scalable Data Processing

Herb T
Thamsen L
Renner T
et al.

Springer International Publishing, (2016), 117-126

DOI: 10.1007/978-3-319-39589-0_9

N/ACitations

3Readers

Get full text

Abstract

This paper describes Aura, a parallel dataflow engine for analysis of large-scale datasets on commodity clusters. Aura allows to compose program plans from relational operators and second-order functions, provides automatic program parallelization and optimization, and is a scalable and efficient runtime. Furthermore, Aura provides dedicated support for control flow, allowing advanced analysis programs to be executed as a single dataflow job. This way, it is not necessary to express, for example, data preprocessing, iterative algorithms, or even logic that depends on the outcome of a preceding dataflow as multiple separate jobs. The entire dataflow program is instead handled as one job by the engine, allowing to keep intermediate results in-memory and to consider the entire program during plan optimization to, for example, re-use partitions.

Cite

CITATION STYLE

APA

Herb, T., Thamsen, L., Renner, T., & Kao, O. (2016). Aura: A Flexible Dataflow Engine for Scalable Data Processing. In Tools for High Performance Computing 2015 (pp. 117–126). Springer International Publishing. https://doi.org/10.1007/978-3-319-39589-0_9

Aura: A Flexible Dataflow Engine for Scalable Data Processing

Abstract

Cite

Register to see more suggestions