Improving Network Availability with Protective ReRoute

12Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present PRR (Protective ReRoute), a transport technique for shortening user-visible outages that complements routing repair. It can be added to any transport to provide benefits in multipath networks. PRR responds to flow connectivity failure signals, e.g., retransmission timeouts, by changing the FlowLabel on packets of the flow, which causes switches and hosts to choose a different network path that may avoid the outage. To enable it, we shifted our IPv6 network architecture to use the FlowLabel, so that hosts can change the paths of their flows without application involvement. PRR is deployed fleetwide at Google for TCP and Pony Express, where it has been protecting all production traffic for several years. It is also available to our Cloud customers. We find it highly effective for real outages. In a measurement study on our network backbones, adding PRR reduced the cumulative region-pair outage time for RPC traffic by 63 - 84%. This is the equivalent of adding 0.4 - 0.8 "nines"of availability.

Cite

CITATION STYLE

APA

Wetherall, D., Kabbani, A., Jacobson, V., Winget, J., Cheng, Y., Morrey, C. B., … Vahdat, A. (2023). Improving Network Availability with Protective ReRoute. In SIGCOMM 2023 - Proceedings of the ACM SIGCOMM 2023 Conference (pp. 684–695). Association for Computing Machinery, Inc. https://doi.org/10.1145/3603269.3604867

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free