Coercing Clients into Facilitating Failover for Object Delivery
2011 IEEEIFIP 41st International Conference on Dependable Systems Networks DSN (2011)
- ISBN: 9781424492312
- DOI: 10.1109/DSN.2011.5958215
Available from
Michael Freedman's profile on Mendeley.
or
Page 1
Coercing Clients into Facilitating Failover for Object Delivery
Coercing Clients into Facilitating Failover for Object Delivery
Wyatt Lloyd, Michael J. Freedman
Princeton University
Abstract—Application-level protocols used for object deliv-
ery, such as HTTP, are built atop TCP/IP and inherit its host-
to-host abstraction. Given that these services are replicated
for scalability, this unnecessarily exposes failures of individual
servers to their clients. While changes to both client and server
applications can be used to mask such failures, this paper
explores the feasibility of transparent recovery for unmodified
object delivery services (TRODS).
The key insight in TRODS is cross-layer visibility and con-
trol: TRODS carefully derives reliable storage for application-
level state from the mechanics of the transport layer. This
state is used to reconstruct object delivery sessions, which are
then transparently spliced into the client’s ongoing connection.
TRODS is fully backwards-compatible, requiring no changes to
the clients or server applications. Its performance is competitive
with unmodified HTTP services, providing nearly identical
throughput while enabling timely failover.
I. INTRODUCTION
Ideally, a client’s interaction with a replicated service will
fail only when the service fails. Yet most Internet services
tie the fate of a client’s connection to a single server,
because they are built using TCP and inherit its host-to-host
bindings. If this single server fails, the client’s connection
breaks, and it appears to the client that the service has
failed. However, if a new server can transparently failover
the connection—that is, interact with the client exactly as
the original server would have—the client’s connection can
continue uninterrupted and unaware of the failure.
We aim to enable failover for a large class of Internet
services, called object delivery services, that play an integral
role in users’ online experiences by giving clients read-
only access to content objects, such as webpages, images,
and videos. Object delivery services are typically replicated
for scalability and fault-tolerance, e.g., there are tens to
thousands of servers that all deliver the same set of objects.
If one such server fails while delivering an object, another
server has the potential to continue delivering it. This paper
demonstrates that such recovery can be done transparently,
effectively, and practically.
Our system, Transparent Recovery for Object Delivery
Services (TRODS), has been designed with the goal of
immediate deployability, which introduces two challenges.
Clients of the service should not be modified: They are
often not under the service’s control and often run different
applications, browsers, and operating systems. Similarly, the
server’s application code should not be modified: Source
code may be unavailable, and application changes would
require integration effort for every service that seeks failover.
Instead, TRODS is implemented as a server-side kernel
module and requires no changes to the client or application.
At a high level, TRODS operates by ensuring that, at
failover time, a recovery server has the minimal application-
level information necessary to continue a connection. This
information is preserved in two ways. First, it can be
retransmitted by the client to its recovery server. TRODS
does not modify the client to accomplish this, instead, it
leverages its on-path position within the server’s kernel to
manipulate a connection’s TCP packets, in order to coerce
the client into retransmitting the information to the new
server. Second, the information can be saved to a persistent
store that will survive the failure of the original server.
We describe two complementary versions of TRODS
that use different resources as persistent stores. The first
version, TRODS-KV, uses a key-value store for persistence.
It improves on previous failover schemes by requiring only
a single remote operation apart from the original server—a
single save to the key-value store—to guarantee any subse-
quent connection failover. The second version, TRODS-TS,
eliminates the need for any remote operations by carefully
repurposing the TCP timestamp option that accompanies
every packet in a connection as the persistent store. These
two approaches are complementary: TRODS-KV is more
general purpose, handles more abnormal object delivery
scenarios, and avoids some additional security concerns.
On the other hand, TRODS-TS has very low overhead and
requires no additional physical resources for deployment.
Together, TRODS-TS can serve the highly-popular objects
of a service, while TRODS-KV can handle the unpopular
and exceptional cases.
This paper focuses on the use of HTTP as the canonical
and ubiquitous protocol for object delivery. However, we
believe that TRODS’ approach is similarly applicable to
other protocols for object delivery.
TRODS has significantly lower overhead than previous
transparent failover schemes. Several of these schemes re-
quire primary and backup servers to process requests in
parallel, e.g., FT-TCP (hot backup) [24] and ST-TCP [14].
This redundant processing reduces the systems’ throughput
per machine by at least 50%. Other prior schemes that
avoid an active backup—e.g., FT-TCP (cold backup) and
CoRAL [1]—still require many remote operations to save
state so it can be replayed at recovery time. In contrast,
TRODS-KV needs only a single remote operation and
TRODS-TS eliminates them altogether.
Wyatt Lloyd, Michael J. Freedman
Princeton University
Abstract—Application-level protocols used for object deliv-
ery, such as HTTP, are built atop TCP/IP and inherit its host-
to-host abstraction. Given that these services are replicated
for scalability, this unnecessarily exposes failures of individual
servers to their clients. While changes to both client and server
applications can be used to mask such failures, this paper
explores the feasibility of transparent recovery for unmodified
object delivery services (TRODS).
The key insight in TRODS is cross-layer visibility and con-
trol: TRODS carefully derives reliable storage for application-
level state from the mechanics of the transport layer. This
state is used to reconstruct object delivery sessions, which are
then transparently spliced into the client’s ongoing connection.
TRODS is fully backwards-compatible, requiring no changes to
the clients or server applications. Its performance is competitive
with unmodified HTTP services, providing nearly identical
throughput while enabling timely failover.
I. INTRODUCTION
Ideally, a client’s interaction with a replicated service will
fail only when the service fails. Yet most Internet services
tie the fate of a client’s connection to a single server,
because they are built using TCP and inherit its host-to-host
bindings. If this single server fails, the client’s connection
breaks, and it appears to the client that the service has
failed. However, if a new server can transparently failover
the connection—that is, interact with the client exactly as
the original server would have—the client’s connection can
continue uninterrupted and unaware of the failure.
We aim to enable failover for a large class of Internet
services, called object delivery services, that play an integral
role in users’ online experiences by giving clients read-
only access to content objects, such as webpages, images,
and videos. Object delivery services are typically replicated
for scalability and fault-tolerance, e.g., there are tens to
thousands of servers that all deliver the same set of objects.
If one such server fails while delivering an object, another
server has the potential to continue delivering it. This paper
demonstrates that such recovery can be done transparently,
effectively, and practically.
Our system, Transparent Recovery for Object Delivery
Services (TRODS), has been designed with the goal of
immediate deployability, which introduces two challenges.
Clients of the service should not be modified: They are
often not under the service’s control and often run different
applications, browsers, and operating systems. Similarly, the
server’s application code should not be modified: Source
code may be unavailable, and application changes would
require integration effort for every service that seeks failover.
Instead, TRODS is implemented as a server-side kernel
module and requires no changes to the client or application.
At a high level, TRODS operates by ensuring that, at
failover time, a recovery server has the minimal application-
level information necessary to continue a connection. This
information is preserved in two ways. First, it can be
retransmitted by the client to its recovery server. TRODS
does not modify the client to accomplish this, instead, it
leverages its on-path position within the server’s kernel to
manipulate a connection’s TCP packets, in order to coerce
the client into retransmitting the information to the new
server. Second, the information can be saved to a persistent
store that will survive the failure of the original server.
We describe two complementary versions of TRODS
that use different resources as persistent stores. The first
version, TRODS-KV, uses a key-value store for persistence.
It improves on previous failover schemes by requiring only
a single remote operation apart from the original server—a
single save to the key-value store—to guarantee any subse-
quent connection failover. The second version, TRODS-TS,
eliminates the need for any remote operations by carefully
repurposing the TCP timestamp option that accompanies
every packet in a connection as the persistent store. These
two approaches are complementary: TRODS-KV is more
general purpose, handles more abnormal object delivery
scenarios, and avoids some additional security concerns.
On the other hand, TRODS-TS has very low overhead and
requires no additional physical resources for deployment.
Together, TRODS-TS can serve the highly-popular objects
of a service, while TRODS-KV can handle the unpopular
and exceptional cases.
This paper focuses on the use of HTTP as the canonical
and ubiquitous protocol for object delivery. However, we
believe that TRODS’ approach is similarly applicable to
other protocols for object delivery.
TRODS has significantly lower overhead than previous
transparent failover schemes. Several of these schemes re-
quire primary and backup servers to process requests in
parallel, e.g., FT-TCP (hot backup) [24] and ST-TCP [14].
This redundant processing reduces the systems’ throughput
per machine by at least 50%. Other prior schemes that
avoid an active backup—e.g., FT-TCP (cold backup) and
CoRAL [1]—still require many remote operations to save
state so it can be replayed at recovery time. In contrast,
TRODS-KV needs only a single remote operation and
TRODS-TS eliminates them altogether.
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
4 Readers on Mendeley
by Discipline
by Academic Status
25% Student (Bachelor)
25% Ph.D. Student
25% Researcher (at a non-Academic Institution)
by Country
50% United States
25% China
25% Germany


