Geographic locality of IP prefixes
Proceedings of the 5th ACM SIGCOMM conference on Internet measurement IMC 05 (2005)
Available from
Michael Freedman's profile on Mendeley.
or
Abstract
Information about the geographic locality of IP prefixes can be useful for understanding the issues related to IP address allocation, aggregation, and BGP routing table growth. In this paper, we use traceroute data and geographic mappings of IP addresses ...
Available from
Michael Freedman's profile on Mendeley.
Page 1
Geographic locality of IP prefixes
Geographic Locality of IP Prexes
Michael J. Freedman Mythili Vutukuru, Nick Feamster, Hari Balakrishnan
New York University Massachusetts Institute of Technology
mfreed@cs.nyu.edu {mythili,feamster,hari}@csail.mit.edu
ABSTRACT
Information about the geographic locality of IP prexes
can be useful for understanding the issues related to IP
address allocation, aggregation, and BGP routing table
growth. In this paper, we use traceroute data and geo-
graphic mappings of IP addresses to study the geographic
properties of IP prexes and their implications on Internet
routing. We nd that (1) IP prexes may be too coarse-
grained for expressing routing policies, (2) address allo-
cation policies and the granularity of routing contribute
signicantly to routing table size, and (3) not considering
the geographic diversity of contiguous prexes may result
in overestimating the opportunities for aggregation in the
BGP routing table.
1. Introduction
Today’s Internet routing infrastructure achieves scalabil-
ity by expressing reachability for large groups of IP ad-
dresses using a single IP prex in a route advertisement.
Today’s largest Internet routing tables provide reachability
to hundreds of millions of end hosts with nearly 200,000
routes [5]. IP addresses that are nearby in IP space may
be geographically or topologically diverse, and vice versa.
This paper quantifies this lack of correspondence. Infor-
mation about the geographic location of hosts within IP
prexes can also help us better understand many issues
related to IP address aggregation and allocation and their
effect on BGP routing table growth.
Our study uses extensive traceroutes and leverages IP-
to-geographic mapping techniques to examine the geo-
graphic properties of multiple destinations within a single
prefix. Our dataset includes traceroutes to at least 4 IP ad-
dresses within each prex of the global routing table, as
well as traceroutes to 1.6 million unique Web clients and
servers that exchanged content over CoralCDN, a popular
peer-to-peer content distribution network [3].
Towards this goal of understanding the geographic prop-
erties of IP prexes, this paper makes three ndings. First,
an IP prex may express only very coarse geographic in-
formation about the destinations (and networks) that it
comprises. This property of the geographic diversity of
hosts within a prex is important for techniques that as-
sume that hosts within an IP prex are topologically close.
As expected, we nd that shorter IP prexes, which rep-
resent a larger portion of the IP address space, tend to
comprise destinations in a large number of geographic lo-
cations, spread over long distances. For example, more
than half the prexes with mask lengths between 8 and
15 span a distance of more than 100 miles. More surpris-
ingly, we nd that longer prexes, albeit a small fraction
of them, can be quite geographically diverse: about 1.4%
of the prexes with mask lengths between 24 and 31 span
a distance of more than 100 miles, and some /24 prexes
span distances of more than 10,000 miles!
Second, autonomous systems (ASes) commonly adver-
tise multiple discontiguous IP prexes for networks in the
same geographic location. In this case, the Internet routing
table must carry multiple routes for a group of destinations
in a single geographic location and a single AS, because
the addresses cannot be expressed as a single IP prex.
This nding suggests that an Internet routing infrastruc-
ture whose routing granularity more closely reects ge-
ography could signicantly reduce the size of the global
routing tables. Additionally, fragmented address alloca-
tion explains 65% of the cases where a single AS was ad-
vertising discontiguous prexes from the same location,
which suggests that IP address renumbering could signi-
cantly reduce the size of the BGP routing table.
Finally, ASes sometimes announce contiguous prexes
from different geographic locations. Ongoing studies,
such as the CIDR Report [2], presume that all contiguous
prexes originated by an AS should be aggregated into a
single IP prex. However, these studies do not consider
whether these prexes actually represent geographically
diverse networks that are intentionally represented as sep-
arate routes. By ignoring location information, the CIDR
Report may overestimate the opportunities for aggregation
by a factor of three.
2. Related Work
Padmanabhan et al. [9] develop a set of techniques to
map IP addresses to geographic locations. One of their
techniques clusters IP addresses at the granularity of an
IP prex to map them to a location. The authors observe
that the accuracy of their method in mapping an IP address
is related to the geographic spread of the hosts within the
Michael J. Freedman Mythili Vutukuru, Nick Feamster, Hari Balakrishnan
New York University Massachusetts Institute of Technology
mfreed@cs.nyu.edu {mythili,feamster,hari}@csail.mit.edu
ABSTRACT
Information about the geographic locality of IP prexes
can be useful for understanding the issues related to IP
address allocation, aggregation, and BGP routing table
growth. In this paper, we use traceroute data and geo-
graphic mappings of IP addresses to study the geographic
properties of IP prexes and their implications on Internet
routing. We nd that (1) IP prexes may be too coarse-
grained for expressing routing policies, (2) address allo-
cation policies and the granularity of routing contribute
signicantly to routing table size, and (3) not considering
the geographic diversity of contiguous prexes may result
in overestimating the opportunities for aggregation in the
BGP routing table.
1. Introduction
Today’s Internet routing infrastructure achieves scalabil-
ity by expressing reachability for large groups of IP ad-
dresses using a single IP prex in a route advertisement.
Today’s largest Internet routing tables provide reachability
to hundreds of millions of end hosts with nearly 200,000
routes [5]. IP addresses that are nearby in IP space may
be geographically or topologically diverse, and vice versa.
This paper quantifies this lack of correspondence. Infor-
mation about the geographic location of hosts within IP
prexes can also help us better understand many issues
related to IP address aggregation and allocation and their
effect on BGP routing table growth.
Our study uses extensive traceroutes and leverages IP-
to-geographic mapping techniques to examine the geo-
graphic properties of multiple destinations within a single
prefix. Our dataset includes traceroutes to at least 4 IP ad-
dresses within each prex of the global routing table, as
well as traceroutes to 1.6 million unique Web clients and
servers that exchanged content over CoralCDN, a popular
peer-to-peer content distribution network [3].
Towards this goal of understanding the geographic prop-
erties of IP prexes, this paper makes three ndings. First,
an IP prex may express only very coarse geographic in-
formation about the destinations (and networks) that it
comprises. This property of the geographic diversity of
hosts within a prex is important for techniques that as-
sume that hosts within an IP prex are topologically close.
As expected, we nd that shorter IP prexes, which rep-
resent a larger portion of the IP address space, tend to
comprise destinations in a large number of geographic lo-
cations, spread over long distances. For example, more
than half the prexes with mask lengths between 8 and
15 span a distance of more than 100 miles. More surpris-
ingly, we nd that longer prexes, albeit a small fraction
of them, can be quite geographically diverse: about 1.4%
of the prexes with mask lengths between 24 and 31 span
a distance of more than 100 miles, and some /24 prexes
span distances of more than 10,000 miles!
Second, autonomous systems (ASes) commonly adver-
tise multiple discontiguous IP prexes for networks in the
same geographic location. In this case, the Internet routing
table must carry multiple routes for a group of destinations
in a single geographic location and a single AS, because
the addresses cannot be expressed as a single IP prex.
This nding suggests that an Internet routing infrastruc-
ture whose routing granularity more closely reects ge-
ography could signicantly reduce the size of the global
routing tables. Additionally, fragmented address alloca-
tion explains 65% of the cases where a single AS was ad-
vertising discontiguous prexes from the same location,
which suggests that IP address renumbering could signi-
cantly reduce the size of the BGP routing table.
Finally, ASes sometimes announce contiguous prexes
from different geographic locations. Ongoing studies,
such as the CIDR Report [2], presume that all contiguous
prexes originated by an AS should be aggregated into a
single IP prex. However, these studies do not consider
whether these prexes actually represent geographically
diverse networks that are intentionally represented as sep-
arate routes. By ignoring location information, the CIDR
Report may overestimate the opportunities for aggregation
by a factor of three.
2. Related Work
Padmanabhan et al. [9] develop a set of techniques to
map IP addresses to geographic locations. One of their
techniques clusters IP addresses at the granularity of an
IP prex to map them to a location. The authors observe
that the accuracy of their method in mapping an IP address
is related to the geographic spread of the hosts within the
Page 2
prex containing that IP address. Our work aims to gain a
deeper understanding of geographic diversity of the hosts
within a single IP prex.
The geographic locality of IP prexes is signicant for
systems like Network Aware Clustering (NAC) [6], which
group hosts that belong to the same prex of the BGP rout-
ing tables into clusters, which are used in applications like
content distribution and proxy positioning. These cluster-
ing schemes rely on the assumption that hosts within a pre-
x are likely to be topologically close and under the same
administrative domain. We investigate the validity of this
assumption in Section 4.1.
Earlier work has also studied impact of factors like IPv4
address allocation and aggregation on the growth of the
BGP routing table [1, 7]. Bu et al. [1] nd that address
fragmentation (where a set of prexes originated by an AS
cannot be summarized by one prex) is the biggest factor
contributing to BGP routing table growth. Our study also
reveals many instances where an AS announces discon-
tiguous prexes, even from the same geographic location.
The CIDR Report studies contiguous prexes an-
nounced by the same AS and the missed opportunities for
aggregation by ASes [2]. In our study, we nd that con-
tiguous prexes announced by the same AS are sometimes
geographically far apart; aggregating such prexes might
conict with an AS’s trafc engineering or load balancing
goals. Thus, the aggregation opportunities suggested by
the CIDR Report might not all be feasible.
3. Data
This paper uses three datasets generated by traceroute
measurements to study the relationship between IP pre-
xes and locality. We mapped IP addresses to IP prexes
using longest-prex matching on a BGP table from Route-
Views [8] from February 27, 2005. This table had approx-
imately 170,000 IP prexes.
As shown in Table 1, Clients and Servers refer to tracer-
outes taken to Web clients and servers that exchanged con-
tent over CoralCDN, a peer-to-peer content distribution
network that receives approximately 10 million HTTP re-
quests per day from widely-dispersed clients [3]. The
client traces cover a 14-day period starting on February
13, 2005, while the server trace covers a single day (April
26, 2005). Each CoralCDN Web proxythere are approx-
imately 225 such proxies deployed on PlanetLab [10]
performed a traceroute to every client destination IP.
While these CoralCDN datasets provide a workload cor-
responding to a real user population, we also sought to
provide coverage of all IP prexes from the RouteViews
table. For the Breadth dataset, we performed traceroutes
to 4 uniformly distributed IP addresses per advertised pre-
x, using 25 PlanetLab hosts as sources. Note that these
traceroutes traverse IP addresses from multiple prexes.
Dataset Period Traceroutes Destinations IPs Prefixes
Clients Feb 13-27, 2005 6,565,844 1,599,228 692,080 45,573
Servers Apr 26, 2005 71,621 36,387 64,378 9,589
Breadth Apr 25, 2005 675,797 649,441 246,626 161,974
Table 1: Traceroute datasets. The last two columns show reach-
able IP addresses and prefixes: routers and destinations from which
ICMP replies were received.
Thus, Breadth actually includes many more data points
than four per prex, especially for transit ASes.
Dataset Mapped Inherited Prefixes ASes Locations
Clients 313,573 180,487 6,136 1,244 1,363
Servers 22,749 5,032 1,693 541 748
Breadth 176,601 130,621 6,828 1,605 1,206
Table 2: IP-to-location assignments.
We use the RouteViews table to map IP addresses to
their ASes and DNS naming heuristics to map IPs to lo-
cations, as described in Section 3.1. Table 2 characterizes
the number of IP addresses mapped to an AS number and
a location (at the city level). We call this location inher-
ited if the destination is not reachable itself (whereupon
we assign it to the location of its closest reachable up-
stream router instead). The inherited dataset is a subset
of mapped, which in turn is a subset of the destination IPs
in Table 1. Table 2 also shows the total number of unique
IP prexes, ASes, and locations in each dataset.
3.1 Mapping IP addresses to locations
We use undns [11] to map IP addresses to locations.
undns extracts geographic information from a DNS
name, which is useful because network operators often
use geographically meaningful names for routers. For ex-
ample, a DNS name of the form qwest-gw.n54ny.ip.att.net
refers to an AT&T (AS 7018) router peering with Qwest,
located at an exchange point on 54th street in New York
City. Other studies have also used this approach [9].
Unfortunately, naming heuristics vary between ISPs,
and parsing is a manual process. ISPs may name routers
by city name or code, airport code, or some 4-to-6 let-
ter abbreviation for city and state. In addition, ISPs in-
corporate such information in hostnames differently; even
a single AS may use multiple heuristics. For exam-
ple, Verio (AS 2914) names gateways in one manner
(e.g., att-gw.nyc.verio.net) and customer addresses in an-
other (e.g., vl-101.a02.nycmny03.us.ce.verio.net). Router
names can also be ambiguous: for example, nycmng-
washng.abilene.ucaid.edu is located in New York but
peers with a router in Washington, D.C. In such special
cases, we manually pinged routers from diverse locations
to better understand their ISP-specic naming heuristics.
undns version 0.1.27a includes manually written host-
name parsing rules for 247 ASes, mostly Tier-1 and Tier-2
ISPs in the US and Europe. We added support for 169 ad-
ditional ASes (including smaller ISPs) and expanded the
deeper understanding of geographic diversity of the hosts
within a single IP prex.
The geographic locality of IP prexes is signicant for
systems like Network Aware Clustering (NAC) [6], which
group hosts that belong to the same prex of the BGP rout-
ing tables into clusters, which are used in applications like
content distribution and proxy positioning. These cluster-
ing schemes rely on the assumption that hosts within a pre-
x are likely to be topologically close and under the same
administrative domain. We investigate the validity of this
assumption in Section 4.1.
Earlier work has also studied impact of factors like IPv4
address allocation and aggregation on the growth of the
BGP routing table [1, 7]. Bu et al. [1] nd that address
fragmentation (where a set of prexes originated by an AS
cannot be summarized by one prex) is the biggest factor
contributing to BGP routing table growth. Our study also
reveals many instances where an AS announces discon-
tiguous prexes, even from the same geographic location.
The CIDR Report studies contiguous prexes an-
nounced by the same AS and the missed opportunities for
aggregation by ASes [2]. In our study, we nd that con-
tiguous prexes announced by the same AS are sometimes
geographically far apart; aggregating such prexes might
conict with an AS’s trafc engineering or load balancing
goals. Thus, the aggregation opportunities suggested by
the CIDR Report might not all be feasible.
3. Data
This paper uses three datasets generated by traceroute
measurements to study the relationship between IP pre-
xes and locality. We mapped IP addresses to IP prexes
using longest-prex matching on a BGP table from Route-
Views [8] from February 27, 2005. This table had approx-
imately 170,000 IP prexes.
As shown in Table 1, Clients and Servers refer to tracer-
outes taken to Web clients and servers that exchanged con-
tent over CoralCDN, a peer-to-peer content distribution
network that receives approximately 10 million HTTP re-
quests per day from widely-dispersed clients [3]. The
client traces cover a 14-day period starting on February
13, 2005, while the server trace covers a single day (April
26, 2005). Each CoralCDN Web proxythere are approx-
imately 225 such proxies deployed on PlanetLab [10]
performed a traceroute to every client destination IP.
While these CoralCDN datasets provide a workload cor-
responding to a real user population, we also sought to
provide coverage of all IP prexes from the RouteViews
table. For the Breadth dataset, we performed traceroutes
to 4 uniformly distributed IP addresses per advertised pre-
x, using 25 PlanetLab hosts as sources. Note that these
traceroutes traverse IP addresses from multiple prexes.
Dataset Period Traceroutes Destinations IPs Prefixes
Clients Feb 13-27, 2005 6,565,844 1,599,228 692,080 45,573
Servers Apr 26, 2005 71,621 36,387 64,378 9,589
Breadth Apr 25, 2005 675,797 649,441 246,626 161,974
Table 1: Traceroute datasets. The last two columns show reach-
able IP addresses and prefixes: routers and destinations from which
ICMP replies were received.
Thus, Breadth actually includes many more data points
than four per prex, especially for transit ASes.
Dataset Mapped Inherited Prefixes ASes Locations
Clients 313,573 180,487 6,136 1,244 1,363
Servers 22,749 5,032 1,693 541 748
Breadth 176,601 130,621 6,828 1,605 1,206
Table 2: IP-to-location assignments.
We use the RouteViews table to map IP addresses to
their ASes and DNS naming heuristics to map IPs to lo-
cations, as described in Section 3.1. Table 2 characterizes
the number of IP addresses mapped to an AS number and
a location (at the city level). We call this location inher-
ited if the destination is not reachable itself (whereupon
we assign it to the location of its closest reachable up-
stream router instead). The inherited dataset is a subset
of mapped, which in turn is a subset of the destination IPs
in Table 1. Table 2 also shows the total number of unique
IP prexes, ASes, and locations in each dataset.
3.1 Mapping IP addresses to locations
We use undns [11] to map IP addresses to locations.
undns extracts geographic information from a DNS
name, which is useful because network operators often
use geographically meaningful names for routers. For ex-
ample, a DNS name of the form qwest-gw.n54ny.ip.att.net
refers to an AT&T (AS 7018) router peering with Qwest,
located at an exchange point on 54th street in New York
City. Other studies have also used this approach [9].
Unfortunately, naming heuristics vary between ISPs,
and parsing is a manual process. ISPs may name routers
by city name or code, airport code, or some 4-to-6 let-
ter abbreviation for city and state. In addition, ISPs in-
corporate such information in hostnames differently; even
a single AS may use multiple heuristics. For exam-
ple, Verio (AS 2914) names gateways in one manner
(e.g., att-gw.nyc.verio.net) and customer addresses in an-
other (e.g., vl-101.a02.nycmny03.us.ce.verio.net). Router
names can also be ambiguous: for example, nycmng-
washng.abilene.ucaid.edu is located in New York but
peers with a router in Washington, D.C. In such special
cases, we manually pinged routers from diverse locations
to better understand their ISP-specic naming heuristics.
undns version 0.1.27a includes manually written host-
name parsing rules for 247 ASes, mostly Tier-1 and Tier-2
ISPs in the US and Europe. We added support for 169 ad-
ditional ASes (including smaller ISPs) and expanded the
Page 3
tool’s international coverage. The latter is especially im-
portant for the Clients dataset, which includes signicant
amounts of trafc from Asia. We spot-checked location
estimates after running undns for some IP addresses in
known locations.
Given a city-level location estimate for a particular IP
address, we also assign to it the latitude and longitude co-
ordinates for that city, which allows us to estimate the dis-
tance between two IP addresses.
3.2 Limitations of mapping technique
Our data has several limitations. First, a reverse DNS
mapping from IP address to hostname may not exist; such
records existed for only 50%-60% of all unique reachable
IP addresses. Second, undns may not have a parsing rule
to map the hostname to a location; our ruleset assigned
locations to about one-third of known hostnames. Third,
undns may return incorrect IP-to-AS number mappings.
Finally, some destinations were not reachable via tracer-
oute. We now discuss mitigating factors for the rst two
limitations and solutions for the latter two.
While we could resolve the hostnames of less than 60%
of IPs, we found that internal ISP routersas opposed to
gateway routers or customer addresseswere more com-
monly missing reverse DNS records. These routers are
unlikely to express more geographic diversity than that al-
ready captured by gateways and customers, so this limita-
tion should not signicantly affect our results.
Even though undns assigned locations for only one-
third of all unique hostnames, two factors reduced the im-
pact of this poorer coverage. First, our ruleset provides
very good coverage for real-world trafc patterns, as we
supply more detailed rules for popular ASes. In fact, we
resolved the location of 90% of probed IPs in Servers (i.e.,
when counting all instances, instead of only unique in-
stances, of hostnames). Second, the hostnames that had
no locality information were most commonly at the net-
work edges where dynamic addressing is used (e.g., cable
modem, DSL, and dialup connections). This may inate
the number of hosts with unassigned locations.
undns uses the hostname of an IP address to determine
its AS number, which could cause us to mistakenly believe
an ISP is announcing a discontiguous prex. For exam-
ple, an IP address in AS 6395 (Broadwing Communica-
tions) carries the hostname sufx .northwestern.edu, even
though its corresponding /14 prex is announced solely
by Broadwing, which provides transit service for North-
western University (AS 103). To solve this problem, we
assigned an AS number to an IP address by performing
longest-prex matching against the RouteViews table.
Finally, many destinations were not directly reachable
when performing traceroutes: 57% of addresses in Clients,
52% in Servers, and 76% in Breadth. This limitation is
generally due to rewalls blocking ICMP packets at large
portions of the networks’ edges. and many destinations
in Breadth were unused IP addresses. To solve this prob-
lem, we assigned an unreachable destination IP address
to the location of its last reachable upstream router. Our
use of traceroutes enables us both to discover routable IP
addresses for rewalled or unused destinations and to de-
termine the upstream addresses for inherited locations.
4. Results
We rst examine the geographic diversity of individual
IP prexes, paying particular attention to the maximum ge-
ographic distance between any two pairs of IP addresses
within a single prex. We then study the extent to which
a single AS advertises multiple discontiguous prexes that
refer to endpoints at a single location, as well as the causes
of these advertisements. Finally, we study the extent to
which an AS advertises contiguous prexes for hosts in
diverse geographic locations.
4.1 Single prefix with multiple locations
In this section, we study the extent to which a single
IP prex comprises hosts in multiple geographic locations
(thus potentially obscuring potentially useful information
by over-aggressive aggregation). Figure 1(a) shows the
number of distinct geographic locations contained within
a single geographic prex for the Clients dataset. As ex-
pected, shorter prexes tend to comprise more geographic
locations.
Figure 1(b) shows that, not only do the shorter prexes
span more geographic locations, but these hosts also span
a much wider geographic distance: nearly half of the pre-
xes in the /8-/15 range span a distance of more than 100
miles. Several of the prexes in this range are either Eu-
ropean backbones or broadband access providers in the
United States: for example, from the Clients dataset, we
nd that AS 7132 (SBC) advertises a single /16 that con-
tains 64 distinct locations spread across the United States.
Transit ASes with smaller address allocations also ad-
vertised prexes containing geographically diverse hosts:
e.g., AS 7657 (The Internet Group, a New Zealand ISP),
advertised a /24 whose IP addresses span 1,400 miles.
Because ASes (particularly US-based backbone ISPs)
often allocate sub-prexes from a single large IP prex, we
expected that prexes that are allocated to transit ISPs are
more likely to have geographically diverse prexes than
those that are allocated to ASes that do not transit trafc
for others. As shown in Figure 1(c), roughly 97% of all
prexes announced by stub ASes (and more than 99% of
all prexes in the /24-/31 range announced by stub ASes)
were announced from the same location.1 The remain-
1Classifying an AS as a “stub” turns out to be difficult, as acquisitions,
unorthodox transit relationships (e.g., Harvard University appears as a
transit for MIT in RouteViews), etc., preclude classifying the leaves of
the RouteViews graph as stub ASes. Instead, we classify an AS as a stub
portant for the Clients dataset, which includes signicant
amounts of trafc from Asia. We spot-checked location
estimates after running undns for some IP addresses in
known locations.
Given a city-level location estimate for a particular IP
address, we also assign to it the latitude and longitude co-
ordinates for that city, which allows us to estimate the dis-
tance between two IP addresses.
3.2 Limitations of mapping technique
Our data has several limitations. First, a reverse DNS
mapping from IP address to hostname may not exist; such
records existed for only 50%-60% of all unique reachable
IP addresses. Second, undns may not have a parsing rule
to map the hostname to a location; our ruleset assigned
locations to about one-third of known hostnames. Third,
undns may return incorrect IP-to-AS number mappings.
Finally, some destinations were not reachable via tracer-
oute. We now discuss mitigating factors for the rst two
limitations and solutions for the latter two.
While we could resolve the hostnames of less than 60%
of IPs, we found that internal ISP routersas opposed to
gateway routers or customer addresseswere more com-
monly missing reverse DNS records. These routers are
unlikely to express more geographic diversity than that al-
ready captured by gateways and customers, so this limita-
tion should not signicantly affect our results.
Even though undns assigned locations for only one-
third of all unique hostnames, two factors reduced the im-
pact of this poorer coverage. First, our ruleset provides
very good coverage for real-world trafc patterns, as we
supply more detailed rules for popular ASes. In fact, we
resolved the location of 90% of probed IPs in Servers (i.e.,
when counting all instances, instead of only unique in-
stances, of hostnames). Second, the hostnames that had
no locality information were most commonly at the net-
work edges where dynamic addressing is used (e.g., cable
modem, DSL, and dialup connections). This may inate
the number of hosts with unassigned locations.
undns uses the hostname of an IP address to determine
its AS number, which could cause us to mistakenly believe
an ISP is announcing a discontiguous prex. For exam-
ple, an IP address in AS 6395 (Broadwing Communica-
tions) carries the hostname sufx .northwestern.edu, even
though its corresponding /14 prex is announced solely
by Broadwing, which provides transit service for North-
western University (AS 103). To solve this problem, we
assigned an AS number to an IP address by performing
longest-prex matching against the RouteViews table.
Finally, many destinations were not directly reachable
when performing traceroutes: 57% of addresses in Clients,
52% in Servers, and 76% in Breadth. This limitation is
generally due to rewalls blocking ICMP packets at large
portions of the networks’ edges. and many destinations
in Breadth were unused IP addresses. To solve this prob-
lem, we assigned an unreachable destination IP address
to the location of its last reachable upstream router. Our
use of traceroutes enables us both to discover routable IP
addresses for rewalled or unused destinations and to de-
termine the upstream addresses for inherited locations.
4. Results
We rst examine the geographic diversity of individual
IP prexes, paying particular attention to the maximum ge-
ographic distance between any two pairs of IP addresses
within a single prex. We then study the extent to which
a single AS advertises multiple discontiguous prexes that
refer to endpoints at a single location, as well as the causes
of these advertisements. Finally, we study the extent to
which an AS advertises contiguous prexes for hosts in
diverse geographic locations.
4.1 Single prefix with multiple locations
In this section, we study the extent to which a single
IP prex comprises hosts in multiple geographic locations
(thus potentially obscuring potentially useful information
by over-aggressive aggregation). Figure 1(a) shows the
number of distinct geographic locations contained within
a single geographic prex for the Clients dataset. As ex-
pected, shorter prexes tend to comprise more geographic
locations.
Figure 1(b) shows that, not only do the shorter prexes
span more geographic locations, but these hosts also span
a much wider geographic distance: nearly half of the pre-
xes in the /8-/15 range span a distance of more than 100
miles. Several of the prexes in this range are either Eu-
ropean backbones or broadband access providers in the
United States: for example, from the Clients dataset, we
nd that AS 7132 (SBC) advertises a single /16 that con-
tains 64 distinct locations spread across the United States.
Transit ASes with smaller address allocations also ad-
vertised prexes containing geographically diverse hosts:
e.g., AS 7657 (The Internet Group, a New Zealand ISP),
advertised a /24 whose IP addresses span 1,400 miles.
Because ASes (particularly US-based backbone ISPs)
often allocate sub-prexes from a single large IP prex, we
expected that prexes that are allocated to transit ISPs are
more likely to have geographically diverse prexes than
those that are allocated to ASes that do not transit trafc
for others. As shown in Figure 1(c), roughly 97% of all
prexes announced by stub ASes (and more than 99% of
all prexes in the /24-/31 range announced by stub ASes)
were announced from the same location.1 The remain-
1Classifying an AS as a “stub” turns out to be difficult, as acquisitions,
unorthodox transit relationships (e.g., Harvard University appears as a
transit for MIT in RouteViews), etc., preclude classifying the leaves of
the RouteViews graph as stub ASes. Instead, we classify an AS as a stub
Page 5
Registry % fragment % discontig % all % used
APNIC 25.11 31.90 30.97 81.07
ARIN 43.69 30.00 27.30 85.97
LACNIC 5.70 14.99 15.89 68.49
RIPENCC 25.50 23.11 25.85 86.38
Table 4: Contribution of the various registries (Breadth dataset).
Using such allocation records, we investigated how of-
ten fragmented allocation was the cause for ASes an-
nouncing discontiguous prexes. If a pair of discontigu-
ous prexes are from discontiguous allocations, then we
conclude that an fragmented allocation has occurred.
Table 4 gives a registry-wise breakdown of the prexes
from fragmented allocations, discontiguous prexes and
the total number of prexes observed. We have also tab-
ulated the total fraction of the address space allocated at
these registries. The table shows that LACNIC experi-
ences less allocation pressure and similarly causes fewer
fragmented allocations.
To further understand the reasons behind discontiguous
allocations, we examined the allocation patterns of the
20 〈AS,location〉 pairs in Breadth from which the largest
number of discontiguous prexes originated. We observed
that 23% of the discontiguous allocations in these 20
〈AS,location〉 pairs were made from discontiguous spaces
on the same day, indicating that the registries were forced
to make such assignments due to the paucity of IPv4 ad-
dresses. The remaining 77% of the allocations were made
during different periods of time. Possible explanations for
discontiguous address space allocations to an AS at differ-
ent points of time are: (1) scarce IPv4 addresses are allo-
cated conservatively to organizations, resulting in a frag-
mented set of addresses for each organization; and (2) two
or more organizations with discontiguous addresses have
one AS number due to a merger or acquisition.
4.2.2 Load balance
An AS might announce a specic subnet of a bigger pre-
x in order to balance load over its two incoming links.
For example, consider an AS with prex pi and two in-
coming links L1 and L2, which desires that the trafc to a
more specic (i.e., longer) prex pj arrive through link
L1 and the remaining trafc through link L2. To achieve
this goal, it announces the longer prex pj over link L1
and pi over L2. This practice is commonly referred to
as BGP hole punching. Let Ddiscontig denote the set
of all discontiguous prexes in a dataset. To determine
whether a pair of prexes {pi, pj} appears in Ddiscontig
due to hole punching, we check if their AS announces a
supernet ps that contains both pi and pj from the same lo-
cation, thus producing a discontiguous pair of prexes. We
can observe from Table 3 that the number of discontiguous
prexes that appear due to load balancing is negligible
0
0.2
0.4
0.6
0.8
1
1 10 100 1000 10000
C
D
F
Distance (miles)
All
/8 - /15
/16 - /23
/24 -/31
(a) Maximum Distance
0
0.2
0.4
0.6
0.8
1
1 10 100 1000 10000
C
D
F
Diameter Ratio
All
/8 - /15
/16 - /23
/24 -/31
(b) Diameter Ratio
Figure 2: Geographic diversity of contiguous prexes announced by
the same AS. Graphs are for the Breadth dataset; other datasets show
similar results.
between 1.5% and 3.9% of the total number of discontigu-
ous prexes.
4.2.3 Misclassification
As our location mapping data is incomplete, we could
have misclassied a set of contiguous prexes as discon-
tiguous due to the absence of traceroutes to some prexes.
Consider a set of contiguous prexes {pi, pj , pk}. Assume
that we have mapped pi and pk to a location L, but we
do not have any location for prex pj . Then, by observ-
ing only prexes pi and pk, we might mistakenly assume
that the AS is announcing discontiguous prexes from the
same location. Hence, for every pair of discontiguous pre-
xes {pi, pk} ∈ Ddiscontig , we check if the missing in-
termediate prexes are in fact announced by the AS in the
RouteViews table. If so, we count this as an instance of
misclassifying the pair {pi, pk} as discontiguous.
In Table 3, we observe that the Breadth dataset has more
misclassications than the other two. This result can be
explained by the fact that, despite tracerouting to all adver-
tised prexes, we could not map all prexes’ locations due
to the limitations of undns. This limitation has a stronger
inuence on Breadth (which reached 161,974 prexes)
than on Clients (which reached 45,573).
4.3 Contiguous prefixes with multiple locations
In this section, we study the extent to which ASes adver-
tise contiguous IP prexes that refer to networks in diverse
geographic locations. We found 2,281 pairs of contiguous
prexes advertised by 384 different ASes. Of these pairs
of prexes, about one-fourth (607) of the pairs contained
hosts in distinct geographic locations.3 This nding sug-
gests that the opportunities for aggregation may be less
than that implied by the CIDR Report.
Figure 2(a) shows a CDF of the maximum distance
spanned by hosts contained within a set of contiguous pre-
3Note that this measure is also a lower bound, as certain IP prefixes that
we attributed to the same location might actually contain hosts in a dif-
ferent location that we did not probe.
APNIC 25.11 31.90 30.97 81.07
ARIN 43.69 30.00 27.30 85.97
LACNIC 5.70 14.99 15.89 68.49
RIPENCC 25.50 23.11 25.85 86.38
Table 4: Contribution of the various registries (Breadth dataset).
Using such allocation records, we investigated how of-
ten fragmented allocation was the cause for ASes an-
nouncing discontiguous prexes. If a pair of discontigu-
ous prexes are from discontiguous allocations, then we
conclude that an fragmented allocation has occurred.
Table 4 gives a registry-wise breakdown of the prexes
from fragmented allocations, discontiguous prexes and
the total number of prexes observed. We have also tab-
ulated the total fraction of the address space allocated at
these registries. The table shows that LACNIC experi-
ences less allocation pressure and similarly causes fewer
fragmented allocations.
To further understand the reasons behind discontiguous
allocations, we examined the allocation patterns of the
20 〈AS,location〉 pairs in Breadth from which the largest
number of discontiguous prexes originated. We observed
that 23% of the discontiguous allocations in these 20
〈AS,location〉 pairs were made from discontiguous spaces
on the same day, indicating that the registries were forced
to make such assignments due to the paucity of IPv4 ad-
dresses. The remaining 77% of the allocations were made
during different periods of time. Possible explanations for
discontiguous address space allocations to an AS at differ-
ent points of time are: (1) scarce IPv4 addresses are allo-
cated conservatively to organizations, resulting in a frag-
mented set of addresses for each organization; and (2) two
or more organizations with discontiguous addresses have
one AS number due to a merger or acquisition.
4.2.2 Load balance
An AS might announce a specic subnet of a bigger pre-
x in order to balance load over its two incoming links.
For example, consider an AS with prex pi and two in-
coming links L1 and L2, which desires that the trafc to a
more specic (i.e., longer) prex pj arrive through link
L1 and the remaining trafc through link L2. To achieve
this goal, it announces the longer prex pj over link L1
and pi over L2. This practice is commonly referred to
as BGP hole punching. Let Ddiscontig denote the set
of all discontiguous prexes in a dataset. To determine
whether a pair of prexes {pi, pj} appears in Ddiscontig
due to hole punching, we check if their AS announces a
supernet ps that contains both pi and pj from the same lo-
cation, thus producing a discontiguous pair of prexes. We
can observe from Table 3 that the number of discontiguous
prexes that appear due to load balancing is negligible
0
0.2
0.4
0.6
0.8
1
1 10 100 1000 10000
C
D
F
Distance (miles)
All
/8 - /15
/16 - /23
/24 -/31
(a) Maximum Distance
0
0.2
0.4
0.6
0.8
1
1 10 100 1000 10000
C
D
F
Diameter Ratio
All
/8 - /15
/16 - /23
/24 -/31
(b) Diameter Ratio
Figure 2: Geographic diversity of contiguous prexes announced by
the same AS. Graphs are for the Breadth dataset; other datasets show
similar results.
between 1.5% and 3.9% of the total number of discontigu-
ous prexes.
4.2.3 Misclassification
As our location mapping data is incomplete, we could
have misclassied a set of contiguous prexes as discon-
tiguous due to the absence of traceroutes to some prexes.
Consider a set of contiguous prexes {pi, pj , pk}. Assume
that we have mapped pi and pk to a location L, but we
do not have any location for prex pj . Then, by observ-
ing only prexes pi and pk, we might mistakenly assume
that the AS is announcing discontiguous prexes from the
same location. Hence, for every pair of discontiguous pre-
xes {pi, pk} ∈ Ddiscontig , we check if the missing in-
termediate prexes are in fact announced by the AS in the
RouteViews table. If so, we count this as an instance of
misclassifying the pair {pi, pk} as discontiguous.
In Table 3, we observe that the Breadth dataset has more
misclassications than the other two. This result can be
explained by the fact that, despite tracerouting to all adver-
tised prexes, we could not map all prexes’ locations due
to the limitations of undns. This limitation has a stronger
inuence on Breadth (which reached 161,974 prexes)
than on Clients (which reached 45,573).
4.3 Contiguous prefixes with multiple locations
In this section, we study the extent to which ASes adver-
tise contiguous IP prexes that refer to networks in diverse
geographic locations. We found 2,281 pairs of contiguous
prexes advertised by 384 different ASes. Of these pairs
of prexes, about one-fourth (607) of the pairs contained
hosts in distinct geographic locations.3 This nding sug-
gests that the opportunities for aggregation may be less
than that implied by the CIDR Report.
Figure 2(a) shows a CDF of the maximum distance
spanned by hosts contained within a set of contiguous pre-
3Note that this measure is also a lower bound, as certain IP prefixes that
we attributed to the same location might actually contain hosts in a dif-
ferent location that we did not probe.
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
14 Readers on Mendeley
by Discipline
7% Engineering
by Academic Status
29% Ph.D. Student
21% Assistant Professor
14% Other Professional
by Country
43% United States
14% Australia
7% South Korea



