Sign up & Download
Sign in

Democratizing Content Distribution

by Michael Joseph Freedman
(2007)

Cite this document (BETA)

Available from Michael Freedman's profile on Mendeley.
Page 1
hidden

Democratizing Content Distribution

Democratizing Content Distribution
Michael Joseph Freedman
A dissertation submitted in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
Department of Computer Science
Courant Institute of Mathematical Sciences
New York University
September 2007
Professor David Mazie`res
Page 2
hidden
c© Michael Joseph Freedman
All Rights Reserved, 2007
Page 4
hidden
“Behold how good and how pleasant it is for
brethren to dwell together in unity.”
— Psalms 133-1
Page 5
hidden
To my future wife Jennifer
for her warmth and support
and
To my brother Daniel
for his courage of convictions
v
Page 6
hidden
Acknowledgments
My advisor, David Mazie`res, always insisted that I pursue research that gives me joy. He certainly
provides a model to emulate. David is tenacious with his questions, precise with his research, and
unwavering in his support. I would not have had the same view of and success in research without
his guidance over the years. I would also like to thank Robert Morris and Frans Kaashoek for
helping to place my feet on the road to academia.
This thesis is based on research I have performed over the past five years with a number of
great colleagues, including Siddhartha Annapureddy, Christina Aperjis, Eric Freudenthal, Ramesh
Johari, Maxwell Krohn, Karthik Lakshminarayanan, David Mazie`res, Sean Rhea, and Ion Stoica.
Graduate school was an exciting and stimulating experience due to their involvements. I would
also like to thank the other two members of my dissertation committee, Dennis Shasha and Laksh-
minarayanan Subramanian, for their advice.
Over my time spent at MIT, NYU, and Stanford, I had the opportunity to collaborate with and
learn from many excellent peers. The intellectual variety and milieu provided by each environ-
ment was invaluable to both the content and style of my research. Beyond those already mentioned
above, I would also especially like to thank David Andersen, Martin Casado, Nick Feamster, Nick
McKeown, Antonio Nicolosi, Benny Pinkas, and Scott Shenker for being research mentors, sound-
ing boards, and (in one case) a partner in technology transfer to industry.
The research in this dissertation was funded through an NDSEG Graduate Research Fellowship
and the IRIS project (http://project-iris.net/), supported by the National Science Founda-
tion under Cooperative Agreement Number ANI-0225660.
Finally, I wish to give heartfelt thanks to my family, for their abiding support, dedication, and
vi
Page 7
hidden
love: my grandparents Estelle, Gerald of blessed memory, Janice, and Saul, my parents Nancy and
Louis, my brother Daniel, and my future wife Jennifer. May good countenance continue to shine
upon them all.
vii
Page 8
hidden
Bibliographic Notes
This thesis is based on research I have performed with a number of excellent colleagues between
2002 and 2007. The consideration of non-transitive routing failures presented in §2.2.4 appears in
a paper co-authored with Karthik Lakshminarayanan, Sean Rhea, and Ion Stoica [60]. An initial
design for the indexing and CDN system described in §2.3 and §3 appears in papers co-authored
with David Mazie`res [58, 59] and Eric Freudenthal [59]. Material for the anycast system in §4
appears in [62], co-authored with Karthik Lakshminarayanan and David Mazie`res. The file-system
design in §5 appears in a paper co-authored with Siddhartha Annapureddy and David Mazie`res [8],
while the erasure-code authentication protocol in §6 is co-authored with Maxwell Krohn and David
Mazie`res [104]. Finally, the ongoing work on market incentives discussed in §7 is with Christina
Aperjis and Ramesh Johari [63].
More information about the systems we present in this thesis—including papers, source code,
software plugins, and system usage and integration instructions—can be found at the website
http://www.coralcdn.org/.
viii
Page 9
hidden
Abstract
In order to reach their large audiences, today’s Internet publishers primarily use content distribution
networks (CDNs) to deliver content. Yet the architectures of the prevalent commercial systems
are tightly bound to centralized control, static deployments, and trusted infrastructure, inherently
limiting their scope and scale to ensure cost recovery.
To move beyond such shortcomings, this thesis contributes a number of techniques that realize
cooperative content distribution. By federating large numbers of unreliable or untrusted hosts,
we can satisfy the demand for content by leveraging all available resources. We propose novel
algorithms and architectures for three central mechanisms of CDNs: content discovery (where are
nearby copies of the client’s desired resource?), server selection (which node should a client use?),
and secure content transmission (how should a client download content efficiently and securely
from its multiple potential sources?).
These mechanisms have been implemented, deployed, and tested in production systems that
have provided open content distribution services for more than three years. Every day, these sys-
tems answer tens of millions of client requests, serving terabytes of data to more than a million
people.
This thesis presents five systems related to content distribution. First, Coral provides a dis-
tributed key-value index that enables content lookups to occur efficiently and returns references to
nearby cached objects whenever possible, while still preventing any load imbalances from form-
ing. Second, CoralCDN demonstrates how to construct a self-organizing CDN for web content
out of unreliable nodes, providing robust behavior in the face of failures. Third, OASIS provides a
general-purpose, flexible anycast infrastructure, with which clients can locate nearby or unloaded
ix
Page 10
hidden
instances of participating distributed systems. Fourth, as a more clean-slate design that can lever-
age untrusted participants, Shark offers a distributed file system that supports secure block-based
file discovery and distribution. Finally, our authentication code protocol enables the integrity veri-
fication of large files on-the-fly when using erasure codes for efficient data dissemination.
Taken together, this thesis provides a novel set of tools for building highly-scalable, efficient,
and secure content distribution systems. By enabling the automated replication of data based on
its popularity, we can make desired content available and accessible to everybody. And in effect,
democratize content distribution.
x
Page 19
hidden
5.4 Shark session establishment protocol . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.5 Shark microbenchmarks. Normalized application performance for various types of
file-system access. Execution times in seconds appear above the bars. . . . . . . . 133
5.6 Client latency. Time (seconds) for ˜100 LAN hosts (Emulab) to finish reading a 10
MB and 40 MB file. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.7 Proxy bandwidth usage. Upstream bandwidth (MBs) served by each Emulab proxy
when reading 40 MB (Emulab and PlanetLab) and 10 MB (Emulab) files. . . . . . 138
5.8 Client latency. Time (seconds) for 185 wide-area hosts (on PlanetLab) to finish
reading a 40 MB file using Shark and SFS. . . . . . . . . . . . . . . . . . . . . . . 139
6.1 Online encoding of a five-block file. bi are message blocks, a1 is an auxiliary
block, and ci are check blocks. Edges represent addition (via XOR). For example,
c4 = b2 + b3 + b5, a1 = b3 + b4, and c7 = a1 + b5. . . . . . . . . . . . . . . . . . . 146
6.2 Number of blocks recoverable as function of number of blocks received. Data
collected over 50 random encodings of a 10,000 block file. . . . . . . . . . . . . . 152
6.3 System parameters and properties for securing erasure codes . . . . . . . . . . . . 154
6.4 Picking secure generators. The seed s can serve as an heuristic “proof” that the
hash parameters were chosen honestly. This algorithm is based on that given in
the NIST Standard [56]. The notation G(x) should be taken to mean that the
pseudo-random number generator G outputs the next number in its pseudo-random
sequence, scaled to the range {0, . . . , x−1}. . . . . . . . . . . . . . . . . . . . . . 155
6.5 Homomorphic hashing microbenchmarks . . . . . . . . . . . . . . . . . . . . . . 165
6.6 Comparison of hash generation performance . . . . . . . . . . . . . . . . . . . . . 169
6.7 Comparison of additional storage requirements for mirrors . . . . . . . . . . . . . 170
6.8 Comparison of bandwidth utilization . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.9 Comparison of per-block verification performance . . . . . . . . . . . . . . . . . . 172
xix
Page 21
hidden
far-flung, unreliable, or untrusted—we can automatically replicate content in proportion to its pop-
ularity. Such systems would thus, in effect, democratize content distribution.
This thesis describes mechanisms and systems that help realize this goal. While we propose
and evaluate novel designs towards this end, our approach is not merely academic: Several of the
systems we describe have been publicly deployed and in widespread use for several years. Not
only do they provide open functionality that other distributed services can leverage, but they are
also directly used by end-users—to the tune of more than a million clients and tens of thousands of
content publishers each day—and thus enable the dissemination of content that may be otherwise
unavailable.
1.1 Wither the democratization of content?
The journalist Walter Winchell once said that “Today’s gossip is tomorrow’s headline.” This is even
more true since the Internet has helped break down communication barriers and enable the rapid
dissemination of information. Messages propagate rapidly through e-mail and other messaging
layers, websites boom and bust in popularity, videos and other multimedia content spread virally,
and so forth. This ever-changing popularity of content leads to heterogeneous and dynamic traffic
demands from users, sometimes spiking into a sudden and often unexpected burst of user requests,
a so-called flash crowd.
At least on the World Wide Web, this dynamism is aided by the largely world-readable open-
ness of content and the nature of hyperlinks. Acquaintances can quickly point others to content
by relating a short Universal Resource Locator (URL). Websites, portals, and weblogs commonly
include such URLs to external sites as well, causing their own users to access these sites through
explicit or embedded links.
While this type of information dissemination enables some content to attain overnight celebrity,
it poses a danger as well. Sites hosting such content risk sudden overload following publicity, as
their resources are insufficient to satisfy the new demand from these flash crowds of users sud-
denly seeking to download their content. These jumps in traffic may be several orders of magni-
2
Page 23
hidden
carefully deciding where to place clusters of such servers in the network. For example, the Lime-
Light network is currently comprised of fifteen data centers in the United States, Western Europe,
Japan, and China, while Akamai has established many smaller clusters to leverage network peer-
ing arrangements in order to reduce their bandwidth costs. The advantage of this approach is that
the network provides a trusted, known, and more predictable infrastructure with which to serve
content. The downside is that CDN providers’ costs often scale linearly with their capacity—i.e.,
in terms of the cost of servers, bandwidth, and power—and these costs are thus passed back to
customers in the form of mostly-linear pricing schedules.
So where does this leave a content publisher that experiences a sudden jump in traffic, yet
does not have such a preexisting service contract with a commercial CDN, given that their normal
traffic patterns are low? Or even when the demand for content is predictable or regular yet still
high, and the content publisher cannot afford these CDN prices? They are left with their content,
quite literally, “in the dark.”
Fortunately, there is a way for popular data to reach many more people than publishers can
afford to serve themselves: other parties can cooperatively mirror the data on their own servers
and networks. There are many reasons why parties will choose to contribute resources for content
distribution. Indeed, the Internet has a long history of altruistic volunteer organizations with good
network connectivity mirroring data they consider to be of value. More recently, peer-to-peer file
sharing has demonstrated the willingness of even individual broadband users to dedicate upstream
bandwidth to redistribute content the users themselves enjoy, in order to improve the global good.
There are less altruistic reasons for participating in such a system as well. By collectively
performing content distribution, content publishers can perform the equivalent of time-sharing on
their network and server resources: they can provision their sites for steady-state traffic patterns,
yet weather any traffic spikes by spreading load among other participating resources. Furthermore,
organizations that mirror popular content reduce their own downstream bandwidth utilization and
improve the latency for local users accessing the mirror. Finally, one can build in mechanisms
that incentivize users to contribute upstream resources, by providing better quality-of-service to
contributors when resources are otherwise scarce.
4
Page 29
hidden
We posit that a cooperative CDN should not attempt to control the placement of data, but rather
provide a content discovery layer that can locate content already cached by participants as a side-
effect of their local access patterns: i.e., index and manage location meta-data, not data itself.
This approach allows different nodes to implement various caching and security policies. Some
might provide unrestricted proxy service for the wider Internet population. Others, however, may
be configured only to cache content requested by their local clients, but also provide upstream
bandwidth to cooperatively serve other proxies (i.e., Step 3 in Figures 1.1 and 1.3).
While content may have an authoritative source, do not assume that it is reliable or sufficiently
provisioned. As our goal is to ensure that popular content remains available, content publishers
with insufficient capacity at their disposal must be spared from a large demand for their resources.
A cooperative CDN designed for this purpose should minimize load at the publisher’s origin server
whenever possible. To do such, the system’s content discover index should enable cooperation
between all participants, such that if a piece of content if available anywhere in the network, it can
be efficiently discovered and transmitted.
The importance of global discovery also arises in cases when sudden load to a publisher’s
origin site would otherwise make it unavailable: Once any content—even if seemingly-unpopular,
perhaps, because a flash crowd to the CDN has only begun—is cached within the CDN, it remains
available to subsequent client requests.
Scale to Internet-size client populations and be robust to failures. No single or few nodes are
sufficiently reliable or scalable to maintain the system’s entire location meta-data index. Thus,
both the content discovery index and the cooperative data cache should be distributed over large
numbers of participants, in order to avoid any central point-of-failure or system bottlenecks.
Leverage participants wherever they may be found. While centrally-planned and -provisioned
systems may be built around well-formed and well-known clusters—and grow by deploying addi-
tional clusters or adding new nodes to existing clusters—a cooperative system has only the peers
that choose to participate. As such, the location of these participants may be unknown, they may
differ greatly by network capacity, they may be quite remote from another other participants,
10
Page 31
hidden
currently serve several terabytes of data to more than a million users every day. The view of the
Internet gained from deploying these systems provided important insights that helped evolve our
systems’ designs.
The first half of this thesis describes content distribution systems that are backwards-compatible
with today’s servers and clients. As such, they can immediately benefit underprovisioned content
publishers seeking to reach large audiences. On the other hand, these techniques are limited in
their ability to leverage untrusted participants, given a lack of end-to-end security mechanisms for
unmodified clients. Thus, in the second half, we consider a more clean-slate approach to building
CDN services, tackling the problem of securing content transmission between distrustful peers.
More specifically, this thesis is organized as follows.
Chapter 2 – Content discovery and Coral. We start with the problem of content discovery: Given a
request for a particular data object, how can one determine whether and where that object re-
sides within the system? This chapter introduces a distributed key-value index well-suited for
large, cooperative CDNs. This distributed index—which maps objects to the nodes caching
them—is designed to replace the centralized functionality described in §1.1.
We start by reviewing the basic indexing approach provided by consistent hashing, and then
discuss the more scalable algorithms of distributed hash tables (DHTs). After describing
how these proposals break under realistic routing conditions and how to fix them [60], we
argue why the traditional key-value stores as proposed in the literature are ill-suited for
building cooperative CDNs and the like. In response, wee propose a new abstraction, called
a distributed sloppy hash table (DSHT), that weakens their consistency model to yield a
better-suited distributed index [58].
This chapter describes and experimentally validates the design of Coral, a distributed index-
ing layer that couples these weaker DSHT semantics with a hierarchical, locality-optimized
structure. Coral enables lookups to occur efficiently and returns references to nearby cached
objects whenever possible, while still preventing hot-spots in the indexing layer, even under
degenerate loads. As such, Coral implements many of the design points we argued for in
§1.3: it scales to Internet-size populations while not controlling data placement, it is robust
12
Page 32
hidden
to failure and self-organizing, it can handle great load imbalances to particular data items
(named by keys in the index), and it yields low discovery and transmission latency.
Chapter 3 – CoralCDN. Leveraging the distribution indexing functionality of Coral, we next describe
the design and implementation of CoralCDN [59], a decentralized, self-organizing web con-
tent distribution network. To use CoralCDN, some party simply appends “.nyud.net” to
the hostname in any URL. Through DNS redirection, oblivious clients accessing this so-
called “Coralized” URL are transparently redirected to nearby CoralCDN web caches. As
such, CoralCDN is trivial to integrate, and thus can—and does—provide immediate relief
for otherwise-overloaded sites.
Upon receiving client requests, CoralCDN caching proxies cooperate to transfer data from
nearby peers whenever possible, dissipating most of the traffic to a URL’s origin server and
enabling its content to reach much larger audiences. CoralCDN’s use of the Coral indexing
layer both enables nodes to efficiently find content without querying distant nodes and avoids
any load imbalances in its overlay and proxy infrastructures.
CoralCDN has been deployed on 300-400 servers for over three years. Each day, it serves
several terabytes of web content to over one million unique client IPs, spread across an
average of 20 to 25 million requests. This chapter concludes by describing robustness, fair-
ness, and security mechanisms—learned through CoralCDN’s deployment—introduced to
improve its effectiveness in real operating environments. We also discuss the failures of
CoralCDN’s original DNS redirection service, which motivated the design of our next sys-
tem.
Chapter 4 – Server selection and OASIS. We next turn to the problem of server selection: Given a set
of CoralCDN proxies, which one should a browser access to maximize performance? But
rather than construct another CoralCDN-specific service, we seek to build a general service
that can support many distributed systems simultaneously, with marginal additional cost per
system.
13
Page 33
hidden
This chapter presents OASIS [62], a shared locality- and load-aware server-selection infras-
tructure. Since OASIS is shared across many application services, it amortizes deployment
and network measurement costs to services that adopt it. Yet to facilitate such sharing, OA-
SIS has to maintain network locality information in an application-independent way. OA-
SIS achieves these goals by mapping different portions of the Internet in advance (based
on IP prefixes) to the geographic coordinates of the nearest known landmark, using sim-
ple constraint satisfaction tests to ensure that measurement results are not (accidentally or
maliciously) erroneous. OASIS is designed explicitly to incentivize cooperation: the more
distributed services that adopt OASIS, the more vantage points OASIS has with which to
perform network measurement, which in turn leads to improved anycast accuracy and de-
creased cost per participating service.
OASIS again demonstrates the facility of distributed key-value indexing techniques intro-
duced in §2, which it uses to store information and assign tasks—which network prefixes
to probe, which servers belong to which distributed systems, where prefixes are located—in
a fully-decentralized manner. OASIS has been publicly deployed for two years, currently
providing anycast service for about a dozen distributed systems.
CoralCDN and OASIS focus on building cooperative services for unmodified clients, with the goal
of immediate deployability and widespread benefit.
What are the implications of this backwards compatibility? First, the systems are limited in
their ability to explore more advanced methods of content transmission directly to end clients: from
chunk-based (or swarming) file distribution (Chapter 5) to erasure-encoded data dissemination
(Chapter 6). Second, and even more important, these unmodified clients cannot readily deal with
malicious proxies. Web security a la SSL, for example, is designed to authenticate servers, not
ensure the integrity of the data they return. A CDN proxy server could thus maliciously modify
content en-route to the client without its detection. While this latter problem is easy to solve, at
least technically provided that the fraction of malicious proxies is small—namely, origin servers
can cryptographically sign content, and a client can failover to alternative proxies or back to the
14
Page 39
hidden
smaller subset of peers, these algorithms also include some routing mechanism by which known
intermediate nodes are used to contact the ultimate node responsible for any key.
In Section §2.3, we argue that the DHT-based storage abstraction that has largely captured the
interest of the academic community—proposed as an building block for higher-level applications—
is actually ill-suited for distributed services such as content distribution: They provide the wrong
abstraction, have poor locality, and control data placement in the network (and thus ignore users’
own preferences and policies).
In response, we present the design of a new key-value indexing layer, called Coral. Section
§2.3.1 describes how Coral weakens the semantics of DHT’s traditional get and put interfaces, in
order to provide an indexing abstraction better-suited for the type of distributed applications we
seek to build. Section §2.3.2 describes how we extend this basic design to a hierarchical structure,
in order to provide better routing and data locality.
Coral makes several contributions. Its new abstraction with weaker semantics is potentially
of use to any application that needs to locate nearby instances of resources on the network: In-
deed, we later build a CDN for web content (Chapter §3) and a distributed file system (Chapter
§5) on top of Coral. Coral introduces an epidemic clustering algorithm that exploits distributed
network measurements to construct its locality-optimized hierarchical structure in a fully self-
organizing manner. Finally, Coral is the first peer-to-peer key-value index that can scale to many
stores of the same key without hot-spot congestion, based on a combination of new rate-limiting
and constrained-routing techniques.
2.2 A distributed key-value index
This section reviews previous work in consistent hashing (§2.2.1) and distributed hash tables
(§2.2.3) for building key-value indexes. While these mechanisms have been proposed by other
researchers in the literature, our contribution lies in extending these approaches to be more scal-
able and fault-tolerant. Namely, §2.2.2 describes the addition of a gossip-based group membership
protocol in order to minimize the amount of liveness probing necessary to maintain a global mem-
20
Page 40
hidden
160−bit id space
00...0011...11
Figure 2.1: Keys (blue squares) are assigned to their successor nodes (circles) on a consistent-
hashing ring. This assignment is illustrated by arrows in the figure.
bership view, while §2.2.4 describes an incorrect assumption of previous designs—specifically, a
fully-connected communication graph between all peers—that leads to faulty behavior under rout-
ing failures, as well as our resulting system modifications to better handle real network conditions.
2.2.1 Consistent hashing with global membership views
Consistent hashing [94] partitions the identifier space of a key-value map over participating nodes
as follows. Each participating node is assigned one or more random identifiers from the space of
integers modulo 2K , where identifiers (IDs) are K-bit numbers. Each key, also drawn from the
same domain of identifiers, is mapped to the node whose node-ID most immediately follows it
in this ID-space, as shown in Figure 2.1. We thus say that the system implements the successor
relation, and so long as each node in the network knows its predecessor in the ID-space, any node
can compute which keys are mapped onto it. For concreteness, we use the set of integers [0, 2160] as
the ID-space in our descriptions, which is well-suited for generating pseudorandomly-distributed
keys by taking the SHA-1 hash [55] of some input (e.g., a node’s IP address).1
The main benefit offered by consistent hashing over static partitioning—e.g., using some hash
function modulo N where N is the number of nodes—is that consistent hashing minimizes the
disruption caused by nodes joining or leaving the system. If κ is the number of keys in the system,
membership changes only disrupt O(κ/N) keys with high probability when using consistent hash-
1One could certainly use 256-bit identifiers and SHA-256 as an alternative hash function instead, if the
continued security of SHA-1’s collision resistance is a concern.
21
Page 41
hidden
ing: A new node takes responsibility of some subset of its successor’s keys, while a node leaving
the system passes responsibility for its keys onto its successor; no other system reconfiguration
of keys is necessary. On the other hand, membership changes when using modulo-based hashing
disrupt O(κ) keys, leading to significant overhead.
If every node knows about every other node in the system, each node can easily determine to
what node every key is mapped. Thus, such a structure can support put and get primitives by first
determining the node responsible for the specified key , then directly contacting that node either to
store the inserted value or to fetch the stored value(s).
Of course, if reliable storage in this key-value map is desired, the system must replicate key-
value pairs across multiple nodes. Replication ensures that when nodes leave the system, the
key-value pair can still be found at some live node. On the other hand, when a new node joins the
system, it should get a copy of all new key-value pairs for which it will be subsequently responsible
(due to a re-partitioning of the identifier space). The literature includes proposals for both reactive
replication [39, 45] (i.e., upon the detection of node failures) and proactive replication [149] (based
on expected system behavior).
Such a global membership view can either be managed by a central membership server, or
each node can occasionally ping every other node to check its liveness and avoid the need for
centralization. In this decentralized model, a node can join the system if it knows at least one
other existing member, e.g., by fetching that node’s membership list and subsequently contacting
all newly-discovered nodes. Of course, this simple membership approach does not ensure that
each node’s view of the network is consistent. It does, however, provide the property of eventual
consistency, given the use of an unreliable failure detector [32]. Strongly-consistent membership
views, on the other hand, would require heavier-weight protocols based on group consensus (e.g.,
Paxos [106]).
Application designers using such a key-value index should determine whether they require such
strong consistency from the mapping of keys to nodes, or whether they can operate with only even-
tual consistency. The applications we later describe are able to function with the latter, either by
only assuming best-effort reliability by the indexing layer (§3, §5) or by replicating keys at multiple
22
Page 42
hidden
hosts (§4), such that temporarily inconsistencies do not effect the correctness of the system. Thus,
they can avoid the communication and complexity overhead of implementing consensus protocols.
2.2.2 Scaling liveness detection for global membership views
We now describe an alternative liveness detection mechanism that greatly reduces the amount of
network probing in the system. Specifically, to avoid the O(N2) probing used by an all-pairs
approach, participating nodes detect and share failure information cooperatively.
In this model, every node maintains a weakly-consistent view of all other nodes in the network,
where each node is identified by its IP address and a globally-unique node identifier as before, as
well as a new incarnation number. Each time period—e.g., 3 seconds—every node picks a random
neighbor to probe and, if the neighbor fails its liveness check, uses epidemic gossiping to quickly
propagate its suspicion of failure.
Two techniques suggested by SWIM [48] reduce false failure announcements. First, if the
initiating node’s direct probe of its neighbor fails to elicit a response, it next chooses several in-
termediates to again probe this target. Only if these indirect probes fail as well does the initiator
announce its suspicious of failure. Such indirect probing alleviates the problem caused by par-
tial network, as opposed to end-host, failures. (We discuss the implications of such non-transitive
failures in §2.2.4.) Second, incarnation numbers help disambiguate failure messages: alive mes-
sages for incarnation i override anything for j < i; suspect for i overrides anything for j ≤ i. If
a node learns that it is suspected of failure, it increments its incarnation number and gossips its
new number as alive (no other party should increment a node’s incarnation number). A node will
only conclude that another node with incarnation i is dead if it has not received a corresponding
alive message for j > i after some time—3 minutes in our implementation—although the node
may treat suspicious nodes differently (e.g., use more aggressive request timeouts when commu-
nicating). This approach provides live nodes with sufficient time to respond to and correct false
suspicions of failure, while still bounding the time that a node failure escapes notice.
To gossip these failure messages, each node maintains a buffer of announcements to be sent (or
piggybacked on other system messages being disseminated throughout the system) to randomly-
23
Page 43
hidden
chosen nodes. This buffer includes messages both originated by the local node or received from
neighbors for retransmission. Every time a node transmits a message to a random neighbor, it
increments a transmission count. When the message has been gossiped O(logN) times for N -
node networks, the message is removed from this buffer. Such an epidemic algorithm propagates
a message to all nodes in logarithmic time with high probability [48].2
While the all-pairs approach is limited solely by network size, scalability in this gossip-based
approach is limited by a function of the product of failure rate and network size; i.e., it can support
small networks with high failure rates, or it can scale to larger networks of stable nodes. Thus, it
may be better suited for decentralized services deployed on managed infrastructure nodes—as we
use for the anycast system of Chapter 4—than for peer-to-peer services on end-hosts.
2.2.3 Partial membership views with DHTs
This previous approach of ID-space partitioning can be scaled to much larger networks if every
node need only know about a subset of the network’s nodes, as opposed to a global view. Enter the
routing layer proposed for distributed hash tables (DHTs), which we review here. While there are
some algorithm differences between the various proposals [116, 150, 162, 177], they all share the
same basic approach.
As in consistent hashing, a DHT assigns every key in the ID-space to a node based on some
distance metric. In consistent hashing and Chord [177], this distance metric is simply the succes-
sor relationship across integers modulo 2K for K-bit keys. Kademlia [116] defines the distance
between two values in the ID-space to be their bitwise exclusive or (XOR), interpreted as an un-
signed integer; thus, IDs with longer matching prefixes (of most significant bits) are numerically
closer. The node closest to a key can be called either the root or successor of the key, terms we use
interchangeable.
The main primitive that DHTs contribute is lookup, in which a node can efficiently discover
a key’s root, even through each node only has a partial view of the entire network. The lookup
2While structured gossiping based on consistent hashing could reduce the bandwidth overhead needed to
disseminate a message [31], we use a randomized epidemic scheme for simplicity.
24
Page 45
hidden
00...00 11...11160−bit id space
1
1
1
1
1
111
1
1
11
0
0
0
0 0
0
0 0
0
0
0
0
000 1 1
1 1
0
10
(a) Nodes in the Kademlia identifier space
00...00
1
1
11
1
0
0
0
0
0
0
01
1
0
160−bit id space 11...11
(b) One node’s view of Kademlia’s ID-space
Figure 2.3: Kademlia DHT. The left-hand graph visualizes all system nodes in a Kademlia tree
based on their node identifiers. The right-hand graph displays the routing tables of one node
(shown larger in red), who knows at least one node in each subtree (separated by thin blue lines)
of increasing distance from it.
s
a
r
b
(a) Recursive routing
r
a
s
b
(b) Iterative routing
Figure 2.4: Recursive and iterative styles of Chord DHT routing
some key k, then s’s routing table almost always contains either the closest node to k, or some node
that is roughly one-half the distance to k. Repeated halvings of the distance leads to a logarithmic
number of routing hops to find the key’s root, node r.
Or, as described using the XOR metric, s discovers a node whose distance to k is at least one
bit shorter. This permits s’s lookup to visit a sequence of nodes with monotonically decreasing
distances [d1, d2, . . .] to k, such that the encoding of di+1 as a binary number has one fewer bit than
di. As a result, again, the expected number of iterations for s to discover the closest node to k is
logarithmic in the number of nodes.
These hops within the routing overlay can either be performed in an iterative or recursive
fashion, as visualized on a Chord ring in Figure 2.4, where source node s initiates a lookup for
26
Page 48
hidden
has also offered its SureRoute [3] overlay routing service to find faster or more reliable paths by
routing through intermediate nodes in its deployed clusters.
2.2.4.2 Fixing non-transitive routing
The basic way to handle non-transitive connectivity is to simply route around the problem. To do
so, we make two modifications to the above key-value indexes: (1) modifying routing tables to
include indirect entries and (2) adding support for a basic forward primitive.
First, we now represent entries in our routing tables as paths to the target destinations, where
direct connectivity allows a path of length one, while indirect connectivity requires paths of length
two or more. Using Figure 2.5 as an example, node a’s routing table would include the following
entries:
Destination Routing entries
b 〈c, b〉
c 〈c〉
Second, we introduce a forward primitive, which takes a destination and message, and wraps
the two for overlay source routing. Given the above example with routing entry b → 〈c, b〉,
forward(b,M) sends (b,M) to c, who forwards (a,M) to b in turn. If this message is part of
a two-way communication such as RPC, b responds with (a,M ′) to c, who forwards (b,M ′) to a.
(And more generally, b can choose to update its own routing table with the entry a→ 〈c, a〉.)
This basic approach can be used in overlays with both global and partial views. In the con-
text of consistent hashing, every node keeps a routing entry about every other node in the above
manner. In fact, our failure detection subsystem (§2.2.2) provides special support for precisely this
functionality: Recall that before suspicion of a node’s failure is raised after direct probing fails, an
initiator will try to use several intermediates next (five in our implementation). Any such interme-
diate node that returns success at contacting the target can immediate be added to the initiator’s
routing table as an indirect route.
In the DHT context, at least when using recursive routing, we only need to be concerned with
such non-transitivity in the context of short links (required for correctness), because the routing
29
Page 49
hidden
structure offers flexibility in the choice of long links (used for efficiency) [78]. To maintain these
short links, nodes can reactively [29, 162] or periodically [155] exchange copies of their local
routing tables; doing such, nodes discover indirection points with which to reach other local peers,
even if no direct path exists. This process is akin to learning about link-state of all local peers.
This solves such problems as inconsistent roots in DHTs [60], where nodes, when only using
direct communication as a means for determining liveness, may have different views of who is the
closest node to some key.
Iterative routing presents more problems when confronted with non-transitive routing, as nodes
may not have direct connectivity to the next-hop peers returned by intermediates, even though those
intermediates do—so called invisible nodes in [60]. In practice, this problem is mitigated in two
ways. First, implementations often cache additional information beyond O(logN) routing state
as a performance optimization (caching live nodes aids in accurate RTT information to set good
timeouts; caching unreachable nodes prevents unnecessary delays while waiting for a request to
time out). It is especially useful also to include nodes with indirect connectivity, accompanied
by their routing entry, in this cache. (The state requirements for such are reasonable, given their
relative scarcity with respect to a single node). Second, iterative routing implementations can keep
a window of multiple outstanding routing requests, and thus avoid the performance hit of the single
node’s failure or delayed response. As a request approaches the key’s root, indirect requests can
be sent through the root’s immediate neighbors, and thus avoid inconsistent roots.
2.3 Weakening the consistency model and adding locality with
Coral
The academic community has proposed the use of distributed hash tables as an efficient, scalable,
and robust storage layer for building higher-level peer-to-peer applications. We now ask whether
DHTs are well-suited for some desired applications of the wider Internet population. For example,
can DHTs be used to implement file-sharing, by far the most popular peer-to-peer application?
Or could DHTs replace proprietary content distribution networks (CDNs), such as Akamai, with
30
Page 50
hidden
a more democratic client caching scheme that saves it from flash crowds at no cost to the server
operator?
We suggest that the answer to these questions is largely no. DHTs fail to meet the needs of
these real peer-to-peer applications for three main reasons.
DHTs provide the wrong abstraction. Suppose many thousands of nodes store a popular music
file or cache a widely-accessed web page. How might a hash table help others find the data? Using
the hash of a web object’s URL as a key, one might use the DHT to store an index: a list of every
node that has the object. That is, have puts append values (e.g., IP addresses) under a given key,
while gets should fetch a consistent list of all such servers caching the content.
DHTs typically replicate popular data for load balancing, but replication helps only with gets,
not puts. Any node seeking a web page will likely also cache it: Any URL-to-node-list mapping
would be updated almost as frequently as it is fetched. Thus, any single node charged with main-
taining the URL-to-node mapping for a popular object can itself become a hot-spot in the indexing
infrastructure.
DHTs have poor routing and data locality. Though some DHTs make an effort to route requests
through nodes with low network latency [28, 47, 78], the last few hops in any lookup request are
essentially random. Thus, a node might need to route a query half-way around the world to learn
that its neighbor is caching a particular data object. This is of particular concern for any peer-to-
peer CDN, as the average node may have considerably worse network connectivity than the web
server itself.
Even worse, if a get just returns a list of all nodes (or some randomly-chosen subset) caching
the object, the initiating node does not have an easy way to determine which node is nearby and
thus more efficient from which to download the content.
Store-based DHTs control data placement. An alternative approach to using a DHT as an index-
ing layer, with its “write-many, read-many” access patterns, is to store actual content in the hash
table and thus support “write-once, read-many” access patterns. Unfortunately, this approach—
taken by CFS [45], PAST [161], Squirrel [89], and CobWeb [172], among others—wastes both
31
Page 55
hidden
the key, in which case the key-value pair will be stored at a more distant node. More specifically,
the forward phase terminates whenever the storing node happens upon another node that is both
full and loaded for the key:
1. A node is full with respect to some key k when it already stores l values for k whose expiry
times—each calculated as the sum of the value’s insertion time and its specified time-to-live
(TTL)—are all at least one-half that of the new value (as calculated from present).
2. A node is loaded with respect to k when it has received more than the maximum leakage
rate β requests for k within the past minute.
In our experiments, l = 4 and β = 12, the latter meaning that under high load, a node claims
to be loaded for all but one store attempt every 5 seconds. This prevents excessive numbers of
requests from hitting the key’s closest nodes, yet still allows enough requests to propagate towards
the root to ensure that nodes near to the key retain fresh values.
In the forward phase, the routing layer makes repeated RPCs to contact nodes successively
closer to the key. Each of these remote nodes returns (1) whether the key is loaded and (2) the
number of values it stores under the key, along with the minimum expiry time of any such values.
The inserting client uses this information to determine if the remote node can accept the store,
potentially evicting a value with a shorter TTL. This forward phase terminates when the client
node finds either the node closest to the key, or a node that is full and loaded with respect to the
key. The client node places all contacted nodes that are not both full and loaded on a stack, ordered
by XOR distance from the key.
During the reverse phase, the inserting client attempts to insert the value at the remote node
referenced by the top stack element, i.e., the contacted node that is closest to the key. If this
operation does not succeed—perhaps due to others’ concurrent insertions—the client node pops
the stack and tries to insert on the new stack top. This process is repeated until a store succeeds or
the stack is empty.
This two-phase algorithm avoids tree saturation by storing values progressively further from
the key. Still, eviction and the leakage rate β ensure that nodes close to the key retain long-lived
36
Page 56
hidden
values, so that live keys remain reachable: β nodes per minute that contact an intermediate node
(including itself) will go on to contact nodes closer to the key. For a perfectly-balanced tree, the
key’s closest node receives only (β · (2b−1) · d lognb e) store requests per minute, when fixing b bits
per iteration.
Proof sketch. Each node in a system of n nodes can be uniquely identified by a string S of
log n bits. Consider S to be a string of b-bit digits. A node will contact the closest node to the
key before it contacts any other node if and only if its ID differs from the key in exactly one digit.
There are d(log n)/be digits in S. Each digit can take on 2b−1 values that differ from the key. Every
node that differs in one digit will throttle all but β requests per minute. Therefore, the closest node
receives a maximum rate of (β · (2b−1) · d lognb e) RPCs per minute.
Irregularities in the node ID distribution may increase this rate slightly, but the overall rate of
traffic is still logarithmic, while in traditional DHTs it is linear. Section §2.3.3.2 provides support-
ing experimental evidence.
2.3.1.3 Retrieving values
To retrieve the value associated with a key k, a node simply traverses the ID space with RPCs.
When it finds a peer storing k, the remote peer returns k’s corresponding list of values. The node
terminates its search and get returns. The requesting client application handles these redundant
references in some application-specific way, e.g., an application proxy contacts multiple sources
addressed by these values in parallel to download cached content.
Note that, as multiple stores of the same key will be spread over multiple nodes, the values
(addresses) retrieved by the application are distributed among those stored. Thus, DSHTs provides
load balancing both within the indexing layer (where the requests are handled) and between peers
using the indexing layer (to where the values point).
2.3.1.4 Handle concurrency via combined put-and-get
If we use a DSHT for applications such as web content distribution—as we do in Chapter 3 with
CoralCDN—we would want to enable participating nodes to offload all traffic from an origin web-
37
Page 57
hidden
server by using a put to index themselves as caching an object, and then having other nodes
discover these cached copies via a DSHT get . Ideally, each object would only be downloaded
from the origin server a single time, even when the system experiences an abrupt flash crowd for a
single object/key.
However, the protocols we described above fail to provide this property, even beyond get fail-
ures that may occur due to routing problems or packet loss. (These network failures are minimized
by our use of multiple outstanding RPCs and requirement that lookups to multiple nodes closest
to the key all fail before the higher-lever get fails.) Namely, redundant fetches may occur also
because a race condition exists in the protocol.
Consider that a key k is not yet inserted into the system. Two nodes both execute a get(k).
Having failed, these nodes fetch the content from the origin server, then optimistically perform a
put(k, self) so that other peers can immediate get the data from them. (We discuss such optimistic
references in §3.2.3.2.) On the node closest to k, however, the operations may serialize with both
gets being handling (and thus returning no values) before either put .
Simply inverting the order of operations is even worse. If multiple nodes first optimistically
perform a put , followed by a get , they can discover one another and effectively form cycles waiting
for one another, with nobody actually fetching data from the origin server (a form of distributed
deadlock).
To eliminate this condition, we extend insert operations to provide return status information,
like test-and-set in shared-memory systems. Specifically, we introduce a single put+get RPC
which performs both operations. The RPC behaves similar to a put as described above, but also
returns the first values discovered in either direction to satisfy the “get” portion of the request.
(Values returned during the forward put direction help performance, as a put may need to traverse
past the first stored values, while a get should return as soon as the first values are found; values
returning during the reverse put phase prevent this race condition.)
38
Page 61
hidden
within the context of the local level-2 cluster. Thus, provided that the key is not already loaded
and full, the node continues its insertion in the level-1 cluster from the point at which the key was
inserted in level 2, much as in the retrieval case. Again, Coral traverses the ID-space only once. As
illustrated in Figure 2.7, this practice results in a loose hierarchical cache, whereby a lower-level
cluster contains nearly all data stored in the higher-level clusters to which its members also belong.
To enable such cluster-aware behavior, the headers of every Coral RPC include the sender’s
list of clusters to which it belongs. The recipient uses this information to demultiplex requests
properly, i.e., a recipient should only consider a put and get for those levels on which it shares a
cluster with the sender. Additionally, this information drives routing table management: (1) nodes
are added or removed from the local cluster-specific routing tables accordingly, and (2) cluster
information is accumulated to drive cluster management, as described next.
2.3.2.3 Joining and managing clusters
To join the Coral network, a Coral node simply needs to contact any existing node in the system
(either using some well-known IP address, DNS lookup, or an anycast system per §4). Once a
Coral node has joined the network, it can build its routing tables by making several lookup queries
to selected keys (especially its own node identifier to explore its local region of the ID-space).
However, for joining non-global clusters, Coral adds one important requirement: A node will
only join an acceptable cluster, where acceptability requires that the pair-wise RTTs to 80% of the
nodes be below the cluster’s threshold (a configurable parameter). A node can easily determine
whether this condition holds by recording minimum RTTs to some subset of nodes belonging to
the cluster.
While nodes learn about clusters as a side effect of normal lookups, Coral also exploits its
DSHTs to store hints. When Coral starts up, it uses a built-in fast traceroute mechanism to de-
termine the addresses of routers up to five hops out. Excluding any private IP addresses (per
RFC-1918 [152]), Coral uses these router /24 prefixes as keys under which to index clustering
hints in its DSHTs. More specifically, a node a stores mappings from each router address to its
own IP address and UDP port number. When a new node b, sharing a gateway with a, joins the
42
Page 62
hidden
network, it will find one or more of a’s hints and quickly cluster with it, assuming a is, in fact,
near b.
Nodes continuously collect clustering information from peers: All RPCs provide round-trip-
times to one’s neighbors and their cluster membership. Each level-i cluster is named by a randomly-
chosen 160-bit cluster identifier, while the level-0 cluster ID is predefined as 0160.
Every five minutes, nodes consider changing their cluster membership based on this piggy-
backed data. If data indicates that an alternative candidate cluster is desirable, a node first validates
the collected data by contacting several nodes within the candidate cluster by routing to selected
keys. A node can also form a new singleton cluster when 50% of its accesses to members of its
present cluster do not meet the RTT constraints. (The difference threshold for joining and leaving
a cluster are used to prevent rapid oscillations in cluster membership.) If probes indicate that 80%
of a cluster’s members are within acceptable RTTs and the alternate cluster is larger (in the loga-
rithmic scale), a node switches to the new cluster. If multiple clusters are acceptable, then Coral
chooses the largest cluster.
Unfortunately, Coral has only rough approximations of cluster size, based on its routing-table
size. If nearby clusters i and j are of similar sizes, inaccurate estimations could lead to oscillation
as nodes flow back-and-forth. To perturb an oscillating system into a stable state, Coral employs a
preference function δ that shifts every hour. A node selects the larger cluster only if the following
holds:


∣ log(size i)− log(sizej)


∣ > δ (min(age i, agej))
where age is the current time minus the cluster’s creation time. Otherwise, a node simply selects
the cluster with the lower cluster ID.
We use a square wave function for δ that takes a value 0 on an even number of hours and 2
on an odd number. For clusters of disproportionate size, the selection function immediately favors
the larger cluster. Otherwise, δ’s transition perturbs clusters to a steady state: Should clusters of
similar size continuously exchange members when δ is zero, as soon as δ transitions, nodes will
all flow to the cluster with the lower cluster ID. Should the clusters oscillate when δ = 2 (as the
43
Page 64
hidden
struct coral_qry_hdr { /* RPC request headers */
qry_hdr {
u_int32 netid; /* Coral network ID known to S */
id srvid; /* 160-bit nodeID of recipient R */
sock_addr srv; /* Known IP address of R */
node_info { /* Description about S */
id clntid; /* 160-bit nodeID of S */
vec<sock_addr> a; /* Public IP address(es) of S */
u_int32 flags; /* Services run on S */
}
}
hierarchy_info { /* Information about sender’s clusters */
vec<cluster_info> { /* List of clusters to which S belongs */
cluster_desc { /* to properly scope request */
id clusterid; /* 160-bit cluster identifier */
u_int32 level; /* Cluster level (1, 2, ) */
}
enum policy_type; /* Type of clustering performed */
u_int32 logsize; /* Log of estimated size of cluster */
u_int32 crtime; /* Creation time in secs (since Epoch) */
}
}
};
struct coral_rsp_hdr { /* RPC response headers */
rsp_hdr {
u_int32 net_id; /* Coral network ID known to R */
sock_addr clnt; /* Known IP address of S */
node_info srv; /* Desc about R, same as in qry_hdr */
}
hierarchy_info hinfo; /* Same as in coral_qry_hdr */
};
Figure 2.8: Format of Coral RPC headers from sender S to recipient R
45
Page 68
hidden
AB
C
C
D
E
F
GG
HHH HHHHHHH II IIII III II II III II II II II I III IIII II I II II I
J
J
J
J
J
JJ J
JJ J J
K
K
X
X
X
3 nodes
2 nodes
1 node
(a) World view of level-1 clusters (60 ms threshold)
A
AA
A
A
A
B
B
B
B
C
C
DD
E
E
E
E
F
F
G
H
H
I
J
J
JJ
K
K
K
L
M
N
OO
OO
O
O
O O
O
O
O
O
P
P P
P
X
X
X
3 nodes
2 nodes
1 node
(b) United States view of level-2 clusters (20 ms threshold)
Figure 2.11: Clustering within the Coral deployment on PlanetLab. Each unique, non-singleton
cluster is assigned a letter on the map; the size of the letter corresponds to the number of collocated
nodes in the same cluster.
49
Page 69
hidden
We have plotted the location of our nodes by latitude/longitude coordinates. If two nodes belong
to the same cluster, they are represented by the same letter. As each PlanetLab site usually collo-
cates several servers, the size of the letter expresses the number of nodes at that site that belong to
the same cluster. For example, the very large “H” (world map) and “A” (U.S. map) correspond to
nodes collocated at U.C. Berkeley. We did not include singleton clusters on the maps to improve
readability; post-run analysis showed that such nodes’ RTTs to others (surprisingly, sometimes
even at the same site) were above the Coral thresholds.
The world map shows that Coral found natural divisions between sets of nodes along geospatial
lines at a 60 ms threshold. The map shows several distinct regions, the most dramatic being the
Eastern U.S. (70 nodes in “I”), the Western U.S. (37 nodes in “H”), and Europe (19 nodes in “J”).
The close correlation between network and physical distance suggests that speed-of-light delays
dominate round-trip-times. Note that, as we did not plot singleton clusters, the map does not
include three Asian nodes (in Japan, Taiwan, and the Philippines, respectively).
The United States map shows level-2 clusters again roughly separated by physical locality.
The map shows 16 distinct clusters; obvious clusters include California (22 nodes in “A”), the
Pacific Northwest (9 nodes in “J”), the South, the Midwest, and so forth. The Northeast Corridor
“O” cluster contains 29 nodes, stretching from North Carolina to Massachusetts. One interesting
aspect of this map is the three separate, non-singleton clusters in the San Francisco Bay Area.
Close examination of individual RTTs between these sites shows widely varying latencies; Coral
clustered correctly given the underlying network topology.
2.3.3.2 Load balancing
To evaluate whether Coral prevents tree saturation within its indexing system, we instrumented
Coral nodes to generate requests at very high rates, all for the same key.
Figure 2.12 shows the extent to which a DSHT balances requests to a single key. In this
experiment, we ran three nodes on each of the earlier hosts for a total of 494 nodes. We configured
the system as a single level-0 cluster. At the same time, all nodes began to issue back-to-back
put and get requests at their maximum (non-concurrent) rates. All operations referenced the same
50
Page 70
hidden

0

12

24

36

48

60

72

84
far
ne
ar
Requests / Minute
Dis
tan
ce
to
po
pu
lar
ke
y
Figure 2.12: Load balancing per key within Coral. The total number of put RPCs hitting each
Coral node per minute, sorted by distance from node ID to specific target key.
(randomly-chosen) key; the values stored during put requests were randomized. On average, each
node issued 400 put and get operation pairs per second, for a total of approximately 12 million
put and get requests per minute, although only a fraction hit the network: Once a node is storing
a key, get requests are satisfied locally, and once it is full and loaded, each node only allows the
leakage rate β put RPCs “through” it per minute.
The graphs show the number of put RPCs that hit each node in steady-state, sorted by the XOR
distance of the node’s ID to the key. During the first minute, the closest node received 106 put
RPCs. In the second minute, as shown in Figure 2.12, the system reached steady-state with the
closest node receiving 83 put RPCs per minute. Recall that our equation in §2.3.2 predicts that
it should receive (β · dlog ne) = 108 RPCs per minute. The difference between the analytically-
compute load (12 · 9) and the experimentally-observed load of approximately (12 · 7) is due to
irregularities in the node ID distribution, as nodes choose their identifiers at random and the ana-
lytical results apply to a perfectly-balanced tree. That said, the plot still strongly emphasizes the
efficacy of the leakage rate β =12, as the number of RPCs received by the majority of nodes is a
low multiple of 12.
No nodes on the far side of the graph received any RPCs. Coral’s routing algorithm explains
this condition: these nodes begin routing by flipping their ID’s most-significant bit to match the
51
Page 72
hidden
nodes routing to the same key are likely to follow similar paths and discover these cached pointers.
Coral’s flexible clustering provides similar latency-optimized lookup and data placement, and its
rate-limiting and constrained routing algorithms prevent multiple stores from forming hot spots.
A few other DHTs have also leveraged a hierarchical structure. But rather than using self-
organizing clustering algorithms like Coral, SkipNet [84] builds a hierarchy by explicitly grouping
nodes based on domain name in order to support organizational disconnect. Canon takes a similar
approach for multiple DHT geometries [65].
To our knowledge, our integration of Coral into the CoralCDN system, described in the next
chapter, was the first use of a DHT-like structure in a production environment. DHTs have sub-
sequently been used in other peer-to-peer services, e.g., Kademlia [116] serves as the basis for
BitTorrent’s and Azureus’s “trackerless” option [17].
53
Page 73
hidden
Chapter 3
Building CoralCDN, a Web Content
Distribution Network
This chapter couples the Coral indexing layer with a complete system design for scalable and
efficient web content distribution. Using CoralCDN, web-site publishers who cannot otherwise
afford data centers or commercial CDNs—and hence rather limited in the size of audience and type
of content they can normally serve—can reach more clients by leveraging the network and server
resources of third parties in a scalable and self-organizing fashion. In other words, CoralCDN helps
democratize web content distribution by automating the process of replicating and propagating
popular content.
3.1 Motivation
To use CoralCDN, a content publisher, end-host client, or someone posting a link to a high-traffic
portal simply appends “.nyud.net” to the hostname in a URL. Through DNS redirection, clients
with unmodified web browsers are transparently redirected to nearby CoralCDN web caches.
These caches cooperate to transfer data from nearby peers whenever possible, thus utilizing the
aggregate bandwidth of participants running the software to absorb and dissipate most of the traf-
54
Page 74
hidden
fic for websites using the system. In doing so, CoralCDN minimizing both the load on the origin
webserver, and it may even improve the end-to-end latency experienced by clients.
CoralCDN is built on top of the Coral key-value indexing layer, described previously in §2.3.
Two properties make Coral ideal for CDNs. First, Coral allows nodes to locate nearby cached
copies of web objects without querying more distant nodes. Second, Coral prevents hot spots in
the indexing infrastructure, even under degenerate loads.
Taken together, the resulting system enables people to publish content that they previously
could not or would not because of distribution costs. To our knowledge, CoralCDN is the first
decentralized and self-organizing web-content distribution network, and it contains the first peer-
to-peer DNS redirection infrastructure, allowing the system to interoperate with unmodified web
browsers.
Measurements of CoralCDN demonstrate that it allows under-provisioned websites to achieve
dramatically higher capacity, and its clustering provides quantitatively better performance than
locality-unaware systems.
CoralCDN has been deployed on 300-400 servers on the PlanetLab network [145] since March
2004. For the past two years, it has served an average of 20-25 million requests for more than two
terabytes of web content to over one million unique client IPs per day. In doing so, it has allowed
websites to scale to otherwise intractable audience sizes, while needed near zero changes to origin
websites to achieve this result.
This chapter describes the usage scenarios of CoralCDN and its system design (§3.2), including
its HTTP proxy and DNS server. We evaluate the system in §3.3, before discussing additional
engineering mechanisms that we introduced in response to operating challenges learned during the
course of CoralCDN’s deployment (§3.4). We conclude by discussing related CDN work (§3.5).
55
Page 75
hidden
3.2 Design
CoralCDN is comprised of three main parts: (1) a network of cooperative HTTP proxies that han-
dle users’ requests,1 (2) a network of DNS nameservers for nyud.net that map clients to nearby
CoralCDN HTTP proxies, and (3) the underlying Coral indexing infrastructure and clustering ma-
chinery on which the first two applications rely.
3.2.1 Usage models
To enable immediate and incremental deployment, CoralCDN is transparent to clients and requires
no software or plug-in installation. CoralCDN can be used in a variety of ways, including:
• Publishers. A website publisher can change selected URLs in their web pages to so-
called “Coralized” URLs—e.g., modify the URL http://www.example.com/ to http:/
/www.example.com.nyud.net/—either through static or dynamic rewriting.
• Third-parties. Any interested third-party—e.g., , a poster to a web portal or a Usenet
group—can Coralize a URL before publishing it, causing all embedded relative links in
the web page to use CoralCDN as well.
• Users. CoralCDN-aware users can manually construct Coralized URLs (or automatically
via installed software or browser plugins) when surfing slow or overloaded websites. Based
on the way these URLs are modified, any HTTP redirects or relative links within returned
web pages are automatically Coralized.
3.2.2 System overview
Figure 3.1 shows the steps that occur when a client accesses a Coralized URL, such as http://
www.example.com.nyud.net/, using a standard web browser.
1While CoralCDN’s HTTP proxy definitely provides proxy functionality, it is not an HTTP proxy in the
strict RFC-2616 [54] sense. Rather, it serves requests that are syntactically formatted for an ordinary HTTP
server.
56
Page 76
hidden
Coral
http prx
Coral
dns srv
http prx
Coral
dns srv
http prx
Coral
dns srv
http prx
Coral
dns srv
http prx
Coral
www.example.com
.nyud.net
.nyud.net/
www.example.com
Resolver Browser
dns srv
4 4
9
8, 11
5
1 6
10
72
3dns srv
http prx
Coral
http prx
dns srv
Figure 3.1: Steps involved in using CoralCDN. The client first resolves a Coralized URL, then
contacts the resulting HTTP proxy to download the content. Rounded boxes represent CoralCDN
nodes running Coral, DNS, and HTTP servers. Solid arrows correspond to Coral RPCs, dashed
arrows to DNS traffic, dotted-dashed arrows to network probes, and dotted arrows to HTTP traffic.
57
Page 78
hidden
we still present this original design in order to discuss its observed shortcomings in §3.4.4, which
motivated and influenced our subsequent redesign.
3.2.3 The CoralCDN HTTP proxy
The CoralHTTP proxy satisfies HTTP requests for Coralized URLs. It seeks to provide reasonable
request latency and high system throughput, even while serving data from origin servers behind
comparatively slow network links such as home broadband connections. This point in the design
space requires particular care in minimizing load on origin servers compared to traditional CDNs,
for two reasons. First, many of CoralCDN’s origin servers are likely to have slower network con-
nections than typical customers of commercial CDNs. Second, commercial CDNs often collocate
a number of machines at each deployment site, and then they select a client’s proxy based in part on
the URL requested—effectively partitioning objects across proxies—often having pre-provisioned
their customer’s content. CoralCDN, in contrast, selects proxies only based on client locality and
does not seek to control data placement by design.
In this section, we describe how CoralHTTP proxy is designed to use the Coral indexing layer
for effective inter-proxy cooperation and for rapid adaptation to flash crowds.
3.2.3.1 Locality-optimized inter-proxy transfers
To aggressively minimize load on origin servers, a CoralHTTP proxy must fetch web pages from
other proxies whenever possible. Each proxy keeps a local cache from which it can immediately
fulfill requests. When a client requests a non-resident URL, CoralHTTP proxy first attempts to
locate a cached copy of the referenced resource using Coral (a get), with the resource indexed by
a SHA-1 hash of the URL. Only after the Coral indexing layer provides no referrals or none of its
referrals return the data, CoralHTTP proxy attempts to fetch the resource directly from the origin.
If a CoralHTTP proxy discovers that one or more other proxies have the data, it attempts to
open TCP connections to multiple other proxies in parallel (currently configured to two), and issues
an HTTP request to the first proxy to which it successfully connects. These pre-established TCP
connections also provide faster failover if the initial request to a different CoralHTTP proxy fails—
59
Page 80
hidden
expiry times (per §2.3.1.2),3 these longer-lived references will overwrite shorter-lived ones, and
they can be stored on well-selected nodes even under high insertion load. CoralHTTP proxies
renew their pointers stored in Coral to these cached objects every time-to-live period, but only
provided that the content has been requested subsequent to its initial download event.
A CoralHTTP proxy manages its disk using a least-recently-used eviction policy (each proxy is
configured with some maximum-allocated disk space, which is 3-4 GB per proxy on our PlanetLab
deployment). Note, however, that the LRU eviction policy can cause a proxy to evict an item, even
while a reference in Coral still exists.4 Recovery from such failures is handled by our use of
multiple HTTP requests for failover, described above. In practice, however, the working set of our
deployed system experiences sufficiently slow turnover as to make such a situation relatively rare
and thus have little impact.
3.2.4 The (original) CoralCDN DNS server
The CoralDNS server returns IP addresses of CoralHTTP proxies when browsers look up the host-
names of Coralized URLs. To improve locality, it attempts to return proxies near requesting clients.
In particular, whenever a DNS resolver (client) contacts a nearby CoralDNS server instance, the
CoralDNS server seeks to both return proxies within an appropriate cluster and ensure that future
DNS requests from that client will not leave the cluster. CoralDNS also exploits on-the-fly net-
work measurement and stores topology hints in Coral to increase the chances of clients discovering
nearby DNS servers.
More specifically, every instance of CoralDNS is an authoritative nameserver for the do-
this file is written to as content is received from a remote node, these connection callbacks are dispatched
and the newly-written content is sent to other remote nodes downstream in the tree.
3The deployed system uses two-hour TTLs for “successful” transmissions (e.g., status code results in-
cluding 200 (OK), 301 or 302 (Moved), etc.). It also uses 15-minute TTLs for non-transient “unsuccessful”
transmissions (e.g., 403 (Forbidden), 404 (File Not Found), etc.) in order to perform negative-result caching
across the network.
4Coral does not provide any mechanism to delete a reference. If a delete primitive was only best-effort,
an application relying on such a function would still need to be able to handle the cases in which the delete
failed. On the other hand, a delete primitive that provided strong consistency would require much heavier-
weight protocols than Coral is designed to support.
61
Page 82
hidden
To achieve better locality, a CoralDNS server also attempts to find proxies near to the client’s
DNS resolver based on network hints. That is, it probes the addresses of the last five network
hops to the resolver and uses the results to look for clustering hints stored in the Coral index. To
avoid significantly delaying clients, it maps these network hops using a fast, built-in traceroute-
like mechanism that combines concurrent probes and aggressive timeouts to minimize latency.
The entire mapping process generally requires around two RTTs and 350 bytes of bandwidth. The
server caches these measurement results to avoid repeatedly probing the same client.
The closer a CoralDNS server is to a client, the better its selection of CoralHTTP proxy ad-
dresses will likely be for the client. The CoralDNS server therefore exploits the authority section
of DNS replies to “lock” a DNS client into a good cluster whenever it happens upon a nearby
CoralDNS server. As with the answer section, a CoralDNS server selects the nameservers it re-
turns from the appropriate cluster level and exploits measurement and network hints as well. Un-
like addresses in the answer section, however, it gives nameservers in the authority section a long
TTL (one hour). A nearby CoralDNS server must therefore override any inferior nameservers that
a DNS client may be caching from previous queries; it does so by manipulating the domain for
which returned nameservers are servers. To clients more distant than the level-1 timing threshold,
CoralDNS server claims to return nameservers for domain L0.nyucd.net. For clients closer than
that threshold, it returns nameservers for L1.L0.nyucd.net. For clients closer than the level-
2 threshold, it returns nameservers for domain L2.L1.L0.nyucd.net. Because DNS resolvers
query the servers for the most specific known domain, this scheme allows closer CoralDNS server
instances to override the results of more distant ones.
Unfortunately, although resolvers can tolerate a fraction of unavailable DNS servers, browsers
do not handle bad HTTP servers gracefully. (This is one reason for returning CoralHTTP proxy
addresses with short TTL fields.) As an added precaution, a CoralDNS server only returns Coral-
HTTP proxy addresses which it has recently verified first-hand. This sometimes means syn-
chronously checking a proxy’s liveness status (via a UDP RPC) prior to replying to a DNS query.
63
Page 84
hidden
0
100
200
300
0 300 600 900 1200
R
e
q
u
e
s
ts
/
M
in
u
te
Time (sec)
< 20ms
< 60ms
global
origin server
Figure 3.2: CoralCDN reducing server load. The number of client accesses to CoralCDN proxies
and the origin HTTP server. Proxy accesses are reported relative to the cluster level from which
data was fetched, and do not include requests handled through local caches.
web page, for a period of 30 minutes. While we recognize that web traffic generally has a Zipf
distribution [19], we are attempting merely to simulate a flash crowd to a popular web page with
multiple, large, embedded images. With 166 clients, we are generating 99.6 requests/second, re-
sulting in a cumulative download rate of approximately 32, 800 Kb/s. This rate is almost two or-
ders of magnitude greater than the origin web server could handle. Note that this rate was chosen
synthetically and in no way suggests a maximum system throughput. (Indeed, our current deploy-
ment on PlanetLab sees 200-300 Mb/s of traffic on average, although this again is not a technical
maximum, as the PlanetLab deployment is artificially limited by bandwidth-shaping mechanisms,
discussed in §3.4.2.)
3.3.1 Server load
Figure 3.2 plots the number of requests per minute that could not be handled by a CoralHTTP
proxy’s local cache. During the initial minute, a total of 15 requests hit the origin webserver (for
the 12 unique files, shown by dotted purple line with squares). The 3 redundant lookups are due
65
Page 85
hidden
to the simultaneity at which requests are generated;5 subsequently, requests are handled either
through CoralCDN’s wide-area cooperative cache or through a proxy’s local cache, supporting our
hypothesis that CoralCDN can migrate load off of an origin webserver.
During this first minute, equal numbers of requests were handled by the level-2 (solid red
line with crosses) and level-1 (dashed green line with x’s) cluster caches. However, as the files
propagated into CoralHTTP proxy caches, requests quickly were resolved within faster level-2
clusters. Within 8-10 minutes, the files became replicated at nearly every server, so few client
requests went further than the proxies’ local caches. Repeated runs of this experiment yielded some
variance in the relative magnitudes of the initial spikes in requests to different levels, although the
number of origin server hits remained consistent.
3.3.2 Client latency
Figure 3.3 shows the end-to-end latency for a client to fetch a file from CoralCDN, following the
steps given in §3.2.2. The top graph shows the cumulative distribution function (CDF) of latency
across all PlanetLab nodes used in the experiment. The bottom graph only includes data from
the clients located on five nodes in Asia: Hong Kong (two), Taiwan, Japan, and the Philippines.
Because most nodes are located in the U.S. or Europe, the performance benefit of clustering is
much more pronounced on the graph of Asian nodes.
Recall that this end-to-end latency includes the time for the client to make a DNS request and
to connect to the discovered CoralHTTP proxy. The contacted proxy attempts to fulfill the client
request first through its local cache, then through Coral, and finally through the origin web server.
Because the CoralHTTP proxy implements cut-through routing, clients begin receiving a file as
soon as the proxy begins downloading it.
These figures report three results: (1) the distribution of latency of clients using only a single
level-0 cluster (the dotted blue line), (2) the distribution of latencies of clients using multi-level
5This race condition may be eliminated through the single put+get primitive of §2.3.1.4. Although now
used in our live PlanetLab deployment, this extension to the Coral indexing layer was not available at the
time of these experiments.
66
Page 87
hidden

0

0.2

0.4

0.6

0.8 1
0.0
1

0.1

1

10
Fraction of Requests
La
ten
cy
(se
c)
m
ult
i-le
ve
l
sin
gle
-le
ve
l
Figure 3.4: Latencies for proxy to get keys from Coral
clusters (the dashed green line), and (3) the same hierarchical network, but using traceroute during
DNS resolution to map clients to nearby proxies (the solid red line).
All clients ran on the same subnet (and host, in fact) as a CoralHTTP proxy in our experimental
setup. This would not be the case in the real deployment: We would expect a combination of hosts
sharing networks with CoralHTTP proxies—within the same IP prefix as registered with Coral—
and hosts without. Although the multi-level network using traceroute provides the lowest latency
at most percentiles, the multi-level system without traceroute also performs better than the single-
level system. Clustering has a clear performance benefit for clients, and this benefit is particularly
apparent for poorly-connected hosts.
Figure 3.4 shows the latency of get operations, as seen by CoralHTTP proxies when they
lookup URLs in Coral (Step 8 of §3.2.2). We plot the get latency on the single level-0 system versus
the multi-level systems. The multi-level system is 2-5 times faster up to the 80% percentile. After
the 98% percentile, the single-level system is actually faster: Under heavy packet loss—which
PlanetLab can (excessively) experience—the multi-level system requires a few more timeouts as it
traverses its hierarchy levels.
68
Page 88
hidden
3.4 Deployment lessons
CoralCDN has been deployed on 300-400 servers on the PlanetLab network [145] for over three
years. For the past two years, it has served an average of 20-25 million requests for more than two
terabytes of web content to over one million unique client IPs per day. This section describes some
of the mechanisms developed and lessons learned during the course of CoralCDN’s deployment.
3.4.1 Robustness mechanisms
Unlike many web proxies, a CoralHTTP proxy regularly interacts with overloaded or poorly-
behaving origin servers, and our explicit goal is to keep content available in the face of such
conditions.
For example, a fairly normal situation for CoralCDN is the following: First, a portal like Slash-
dot [170] or digg [50] runs a story that links to a third-party website, driving a sudden influx of
readers to this previously unpopular site. Second, a user posts a Coralized link to the same website
in a “comment” to the portal’s story, providing an alternate means to fetch the content. (Similarly,
several third-party browser plugins automatically Coralize their users’ links for either all web con-
tent or for that of special pages, e.g., Slashdot.) Several things can occur given these situations, all
of which should be handled differently.
• The website’s origin server(s) become unavailable before CoralCDN proxies can download
a copy of the content.
• CoralCDN already has a copy of the content, but requests arrive to CoralCDN after the
content’s expiry time has passed; subsequent HTTP requests to the origin webserver result
in failures or error statements.
• CoralCDN already has a copy of the content, but requests arrive to CoralCDN after the
content’s expiry time has passed; subsequent requests to the origin server yield only partial
transfers.
We next consider how CoralCDN handles these different situations.
69
Page 89
hidden
Negative result caching. CoralCDN may be hit with a flood of requests for a URL that is not
accessible for a variety of reasons (i.e., DNS resolution fails, the webserver does not respond to
TCP connection requests, etc.). For these situations, CoralCDN maintains a negative result cache,
whereby after repeated failures, CoralHTTP proxies will locally cache information that certain
hostnames will not resolve or establish TCP sessions. Otherwise, we have seen excessive load and
resource exhaustion on both CoralHTTP proxies and the proxies’ local DNS resolvers given flash
crowds to apparently dead sites (e.g., because the proxy would need to hold open client connections
for several seconds while attempting to resolve or connect to the specified hostname). Notice that,
unlike when an origin server returns a permanent status-code error, this failure status is only cached
locally, as it may be caused by a local network or DNS failure at the proxy and not at the origin
server at all.
Serving stale content. CoralCDN proxies obey the expiry times of content, as specified in
Cache-Control (HTTP/1.1) or Expires (HTTP/1.0) header.6 (If no expiry time is specified,
CoralHTTP proxies default to twelve hours.) As such, after content has expired, CoralHTTP
proxies will attempt to perform a conditional (If-Modified-Since) request. If the content has
not changed, of course, these requests minimize overhead given that the origin can simply respond
with a 304 (Not Modified) message and not resend the same body content.
What happens, however, if the origin server does not respond at all, or simply responds with
some temporary error condition? Rather than return an error to the client, a CoralHTTP proxy
will return stale content instead (and update its expiry timer again). Specifically, if the origin
responds with a 403 (Forbidden), 404 (Not Found), 408 (Timeout), 500 (Internal Server Error), or
6The only caveat is that a CoralHTTP proxy sets a minimum expiry time of some duration (currently
configured as five minutes), and thus does not recognize No-Cache directives as such. Note, however, be-
cause CoralCDN does not support cookies, SSL bridging, or POSTs, most of the privacy concerns normally
associated with caching such content are alleviated. Our anecdotal experience suggests many sites use such
directives to provide their sites with complete HTTP logs (e.g., for counting page views), not for privacy
reasons. We note that because CoralCDN does not re-write web pages, the origin site will see still requests
for any absolute URLs embedded in the page; thus, normal tricks with web-beacons can provide their sites
with accurate logs.
70
Page 94
hidden
that the origin site knows that it should serve the request, as opposed to creating a request loop by
dynamically redirecting the client back to CoralCDN again.)
3.4.3 Security mechanisms
CoralHTTP proxies also incorporate a number of security mechanisms to prevent misuse. This
section discusses some of the security protections and limitations of CoralCDN’s current deploy-
ment.
Limited functionality. CoralCDN proxies only support GET and HEAD requests. Therefore,
many of the attacks for which “open” proxies are infamous [134] are simply not feasible. For
example, clients cannot use CoralCDN to POST passwords for brute-force cracking. CoralHTTP
proxies do not support CONNECT requests, and thus they cannot be used to resend email as an
SMTP relay, to forge “From” addresses in web mail, or to spam IRC networks. Furthermore, be-
cause CoralCDN only handles Coralized URLs, it cannot be used directly by pointing a browser’s
proxy settings at it. Although this certainly does not offer any real security, it results in CoralCDN
not appearing on most “open-proxy lists” found on the Web, thus reducing CoralCDN’s exposure.
Domain blacklisting. We currently maintain a global domain-name blacklist that each Coral-
HTTP proxy regularly fetches and reloads during run-time. This blacklist is used to satisfy origin
servers that do not wish their content to be accessed via CoralCDN. Such restrictions are especially
important for sites that perform IP-based authentication, as CoralHTTP proxies may be located
within IP blocks that have been whitelisted by such authorities (this is true for many academic
journals, for instance, as the deployed CoralHTTP proxies mostly run on university networks).
The technical need for such blacklists is not apparent, however; certainly any modern web-
server could reject CoralCDN requests based on its unique User-Agent string (“CoralWebPrx”),
as CoralCDN does not attempt to anonymize itself as an end-user. Indeed, CoralCDN HTTP re-
quests include both Via and X-Forwarded-For headers as well. (In fact, we show in [26] how
web proxies can be detected as such in real-time, even when they do not include required proxy
headers.) In practice, however, many site operators are either unwilling or ill-trained to perform
75
Page 96
hidden
or digg, suggesting a potential flash crowd; or by combining such rewriting with load detection, so
that their servers only start rewriting requests in order to shed load that they otherwise could not
handle.
When using dynamic rewriting, it is critical that sites check that the request is not from a
CoralHTTP proxy itself; otherwise, an HTTP redirect will be cached and returned to the client,
causing a loop (and the real content to be unreachable). For example, a regular-expression rule for
Apache’s mod rewrite would look like the following:
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} !^CoralWebPrx
RewriteCond %{QUERY_STRING} !(^|&)coral-no-serve$
RewriteRule ^/(.*)$ http://example.com.nyud.net/$1 [R,L]
where the second line checks if the client is a CoralCDN proxy and the third-line checks if the
request is not due to the “failover” condition described in §3.4.2. While CoralHTTP proxies do
perform a simple check for looping—will the Location header returned cause a simple loop—we
have found more complex, multi-hop loops to occur in practice (mostly through misconfigurations,
not malice). Thus, tracking a client’s request frequency at a proxy also helps protect CoralCDN
against such conditions.
Amusingly, some users early during CoralCDN’s deployment caused recursion in a different
way: by submitting URLs with many copies of nyud.net in a hostname’s suffix: L repeated
instances of nyud.net would cause a CoralHTTP proxy to make L recursive requests, stripping
the last instance off in each iteration. CoralHTTP proxies now verify that URLs do not contain
such expressions.
No cookies. CoralCDN does not support cookies; it deletes any Cookie or Set-Cookie HTTP
headers in both directions. Many websites now manage and set cookies via javascript, however,
as opposed to via HTTP headers. This could be problematic from a privacy standpoint if the
CoralHTTP proxies were running on untrusted clients, especially for “third-party” cookies that
could expose linkages between a client’s use of multiple sites. Even worse, however, is that the
“same-origin” RFC-2965 security policies [103] are ill-suited for liberal configuration settings
77
Page 100
hidden
mance for small organizations hosting nodes, as it is not economically efficient for commercial
CDNs to deploy machines behind most bottleneck links (e.g., LimeLight builds out their network
by deploying higher numbers of servers at still relatively few data centers).
At about the same time as CoralCDN’s deployment, CoDeeN started providing users with a set
of open web proxies [42, 186]. Users can reconfigure their browsers to use a CoDeeN proxy and
subsequently enjoy better performance. The system has also been deployed on PlanetLab, and it
has been enjoyed similar success at distributing content efficiently. Although CoDeeN gives par-
ticipating users better performance to most websites, CoralCDN’s goal is to gives most users better
performance to participating websites—namely those whose publishers have Coralized the URLs.
The two design points pose somewhat different challenges. For instance, CoralCDN takes pains to
greatly minimize the load on under-provisioned origin servers, while CoDeeN has tighter latency
requirements as it is on the critical path for all web requests. More recently, CobWeb [172] has
taken a similar approach to CoDeeN, but using proactive replication for managing content [149].
This approach can yield high performance, but, much like CoDeeN, it has the key-value layer con-
trol data placement on participating nodes, as opposed to just using an indexing system such as
Coral to leverage nodes’ underlying access patterns.
81

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

2 Readers on Mendeley
by Discipline
 
by Academic Status
 
50% Post Doc
 
50% Assistant Professor
by Country
 
50% Germany
 
50% United States