Abstract
Cluster computing applications like MapReduce and Dryad transfer massive amounts of data between their computation stages. These transfers can have a significant impact on job performance, accounting for more than 50% of job completion times. Despite this impact, there has been relatively little work on optimizing the performance of these data transfers, with networking researchers traditionally focusing on per-flow traffic management. We address this limitation by proposing a global management architecture and a set of algorithms that (1) improve the transfer times of common communication patterns, such as broadcast and shuffle, and (2) allow scheduling policies at the transfer level, such as prioritizing a transfer over other transfers. Using a prototype implementation, we show that our solution improves broadcast completion times by up to 4.5X compared to the status quo in Hadoop. We also show that transfer-level scheduling can reduce the completion time of high-priority transfers by 1.7X.
Author supplied keywords
Cite
CITATION STYLE
Chowdhury, M., Zaharia, M., Ma, J., Jordan, M. I., & Stoica, I. (2011). Managing data transfers in computer clusters with orchestra. Computer Communication Review, 41(4), 98–109. https://doi.org/10.1145/2043164.2018448
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.