This paper describes Aura, a parallel dataflow engine for analysis of large-scale datasets on commodity clusters. Aura allows to compose program plans from relational operators and second-order functions, provides automatic program parallelization and optimization, and is a scalable and efficient runtime. Furthermore, Aura provides dedicated support for control flow, allowing advanced analysis programs to be executed as a single dataflow job. This way, it is not necessary to express, for example, data preprocessing, iterative algorithms, or even logic that depends on the outcome of a preceding dataflow as multiple separate jobs. The entire dataflow program is instead handled as one job by the engine, allowing to keep intermediate results in-memory and to consider the entire program during plan optimization to, for example, re-use partitions.
CITATION STYLE
Herb, T., Thamsen, L., Renner, T., & Kao, O. (2016). Aura: A Flexible Dataflow Engine for Scalable Data Processing. In Tools for High Performance Computing 2015 (pp. 117–126). Springer International Publishing. https://doi.org/10.1007/978-3-319-39589-0_9
Mendeley helps you to discover research relevant for your work.