Data-intensive and long-lasting applications running in the form of workflows are being increasingly more dispatched to cloud computing systems. Current scheduling approaches for graphs of dependencies fail to deliver high resource efficiency while keeping computation costs low, especially for continuous data processing workflows, where the scheduler does not perform any reasoning about the impact new input data may have in the workflow final output. To face such stark challenge, we introduce a new scheduling criterion, Quality-of-Data (QoD), which describes the requirements about the data that worth the triggering of tasks in workflows. Based on the QoD notion, we propose a novel service-oriented scheduler planner, for continuous data processing workflows, that is capable of enforcing QoD constraints and guide the scheduling to attain resource efficiency, overall controlled performance, and task prioritization. To contrast the advantages of our scheduling model against others, we developed WaaS (Workflow-as-a-Service), a workflow coordinator system for the Cloud where data is shared among tasks via cloud columnar database. © Springer International Publishing Switzerland 2014.
CITATION STYLE
Esteves, S., & Veiga, L. (2014). Planning and scheduling data processing workflows in the cloud with quality-of-data constraints. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8377 LNCS, pp. 324–338). Springer Verlag. https://doi.org/10.1007/978-3-319-06859-6_29
Mendeley helps you to discover research relevant for your work.