Managing data transfers in computer clusters with orchestra

Mosharaf Chowdhury; Matei Zaharia; Justin Ma; Michael I. Jordan; Ion Stoica

Journal Article

Managing data transfers in computer clusters with orchestra

Computer Communication Review (2011) 41(4) 98-109

DOI: 10.1145/2043164.2018448

147Citations

266Readers

Get full text

Abstract

Cluster computing applications like MapReduce and Dryad transfer massive amounts of data between their computation stages. These transfers can have a significant impact on job performance, accounting for more than 50% of job completion times. Despite this impact, there has been relatively little work on optimizing the performance of these data transfers, with networking researchers traditionally focusing on per-flow traffic management. We address this limitation by proposing a global management architecture and a set of algorithms that (1) improve the transfer times of common communication patterns, such as broadcast and shuffle, and (2) allow scheduling policies at the transfer level, such as prioritizing a transfer over other transfers. Using a prototype implementation, we show that our solution improves broadcast completion times by up to 4.5X compared to the status quo in Hadoop. We also show that transfer-level scheduling can reduce the completion time of high-priority transfers by 1.7X.

Author supplied keywords

Cite

CITATION STYLE

APA

Chowdhury, M., Zaharia, M., Ma, J., Jordan, M. I., & Stoica, I. (2011). Managing data transfers in computer clusters with orchestra. Computer Communication Review, 41(4), 98–109. https://doi.org/10.1145/2043164.2018448

Managing data transfers in computer clusters with orchestra

Abstract

Author supplied keywords

Cite

Register to see more suggestions