Aura: A Flexible Dataflow Engine for Scalable Data Processing

  • Herb T
  • Thamsen L
  • Renner T
  • et al.
N/ACitations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper describes Aura, a parallel dataflow engine for analysis of large-scale datasets on commodity clusters. Aura allows to compose program plans from relational operators and second-order functions, provides automatic program parallelization and optimization, and is a scalable and efficient runtime. Furthermore, Aura provides dedicated support for control flow, allowing advanced analysis programs to be executed as a single dataflow job. This way, it is not necessary to express, for example, data preprocessing, iterative algorithms, or even logic that depends on the outcome of a preceding dataflow as multiple separate jobs. The entire dataflow program is instead handled as one job by the engine, allowing to keep intermediate results in-memory and to consider the entire program during plan optimization to, for example, re-use partitions.

Cite

CITATION STYLE

APA

Herb, T., Thamsen, L., Renner, T., & Kao, O. (2016). Aura: A Flexible Dataflow Engine for Scalable Data Processing. In Tools for High Performance Computing 2015 (pp. 117–126). Springer International Publishing. https://doi.org/10.1007/978-3-319-39589-0_9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free