Cost-Efficient Tasks and Data Co-Scheduling with AffordHadoop

Moussa Ehsan; Karthiek Chandrasekaran; Yao Chen; Radu Sion

Journal ArticleOPEN ACCESS

Cost-Efficient Tasks and Data Co-Scheduling with AffordHadoop

IEEE Transactions on Cloud Computing (2019) 7(3) 719-732

DOI: 10.1109/TCC.2017.2702661

5Citations

8Readers

Get full text

Abstract

With today’s massive jobs spanning thousands of tasks each, cost-optimality has become more important than ever. Modern distributed data processing paradigms can be significantly more sensitive to cost than makespan, especially for long jobs deployed in commercial clouds. This paper posits that minimized dollar costs can not be achieved unless data and tasks are scheduled simultaneously. In this paper, we introduce the problem of cost-efficient co-scheduling for highly data-intensive jobs in cloud, such as MapReduce. We show that while the problem is polynomial in some cases, its general problem is NP-Hard. We propose to tackle the problem by using integer programming techniques coupled with heuristic reduction and optimization to enable a near-realtime solution. AffordHadoop, a pluggable co-scheduler for Hadoop, is implemented as an example of such a co-scheduler. AffordHadoop can save up to 48 percent of the overall dollar costs when compared to existing schedulers and provides significant flexibility in fine-tuning the cost-performance tradeoff.

Author supplied keywords

Cite

CITATION STYLE

APA

Ehsan, M., Chandrasekaran, K., Chen, Y., & Sion, R. (2019). Cost-Efficient Tasks and Data Co-Scheduling with AffordHadoop. IEEE Transactions on Cloud Computing, 7(3), 719–732. https://doi.org/10.1109/TCC.2017.2702661

Cost-Efficient Tasks and Data Co-Scheduling with AffordHadoop

Abstract

Author supplied keywords

Cite

Register to see more suggestions