Cost-Efficient Tasks and Data Co-Scheduling with AffordHadoop

5Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

With today’s massive jobs spanning thousands of tasks each, cost-optimality has become more important than ever. Modern distributed data processing paradigms can be significantly more sensitive to cost than makespan, especially for long jobs deployed in commercial clouds. This paper posits that minimized dollar costs can not be achieved unless data and tasks are scheduled simultaneously. In this paper, we introduce the problem of cost-efficient co-scheduling for highly data-intensive jobs in cloud, such as MapReduce. We show that while the problem is polynomial in some cases, its general problem is NP-Hard. We propose to tackle the problem by using integer programming techniques coupled with heuristic reduction and optimization to enable a near-realtime solution. AffordHadoop, a pluggable co-scheduler for Hadoop, is implemented as an example of such a co-scheduler. AffordHadoop can save up to 48 percent of the overall dollar costs when compared to existing schedulers and provides significant flexibility in fine-tuning the cost-performance tradeoff.

Cite

CITATION STYLE

APA

Ehsan, M., Chandrasekaran, K., Chen, Y., & Sion, R. (2019). Cost-Efficient Tasks and Data Co-Scheduling with AffordHadoop. IEEE Transactions on Cloud Computing, 7(3), 719–732. https://doi.org/10.1109/TCC.2017.2702661

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free