Masking failures from application performance in data center networks with shareable backup

20Citations
Citations of this article
62Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Shareable backup is an economical and effective way to mask failures from application performance. A small number of backup switches are shared network-wide for repairing failures on demand so that the network quickly recovers to its full capacity without applications noticing the failures. This approach avoids complications and ineffectiveness of rerouting. We propose ShareBackup as a prototype architecture to realize this concept and present the detailed design. We implement ShareBackup on a hardware testbed. Its failure recovery takes merely 0.73ms, causing no disruption to routing; and it accelerates Spark and Tez jobs by up to 4.1× under failures. Large-scale simulations with real data center traffic and failure model show that ShareBackup reduces the percentage of job flows prolonged by failures from 47.2% to as little as 0.78%. In all our experiments, the results for ShareBackup have little difference from the no-failure case.

Cite

CITATION STYLE

APA

Wu, D., Huang, X. S., Xia, Y., Dzinamarira, S., Sun, X. S., & Eugene Ng, T. S. (2018). Masking failures from application performance in data center networks with shareable backup. In SIGCOMM 2018 - Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (pp. 176–190). Association for Computing Machinery. https://doi.org/10.1145/3230543.3230577

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free