Optimal task ordering in chain data flows: Exploring the practicality of non-scalable solutions

2Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Modern data flows generalize traditional Extract-Transform-Load and data integration workflows in order to enable end-to-end data processing and analytics. The more complex they become, the more pressing the need for automated optimization solutions. Optimizing data flows comes in several forms, among which, optimal task ordering is one of the most challenging ones. We take a practical approach; motivated by real-world examples, such as those captured by the TPC-DI benchmark, we argue that exhaustive non-scalable solutions are indeed a valid choice for chain flows. Our contribution is that we thoroughly discuss the three main directions for exhaustive enumeration of task ordering alternatives, namely backtracking, dynamic programming and topological sorting, and we provide concrete evidence up to which size and level of flexibility of chain flows they can be applied.

Cite

CITATION STYLE

APA

Kougka, G., & Gounaris, A. (2017). Optimal task ordering in chain data flows: Exploring the practicality of non-scalable solutions. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10440 LNCS, pp. 19–32). Springer Verlag. https://doi.org/10.1007/978-3-319-64283-3_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free