Managing data transfers in computer clusters with orchestra

147Citations
Citations of this article
266Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Cluster computing applications like MapReduce and Dryad transfer massive amounts of data between their computation stages. These transfers can have a significant impact on job performance, accounting for more than 50% of job completion times. Despite this impact, there has been relatively little work on optimizing the performance of these data transfers, with networking researchers traditionally focusing on per-flow traffic management. We address this limitation by proposing a global management architecture and a set of algorithms that (1) improve the transfer times of common communication patterns, such as broadcast and shuffle, and (2) allow scheduling policies at the transfer level, such as prioritizing a transfer over other transfers. Using a prototype implementation, we show that our solution improves broadcast completion times by up to 4.5X compared to the status quo in Hadoop. We also show that transfer-level scheduling can reduce the completion time of high-priority transfers by 1.7X.

Cite

CITATION STYLE

APA

Chowdhury, M., Zaharia, M., Ma, J., Jordan, M. I., & Stoica, I. (2011). Managing data transfers in computer clusters with orchestra. Computer Communication Review, 41(4), 98–109. https://doi.org/10.1145/2043164.2018448

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free