Topology inference from BGP routing dynamics
- ISBN: 158113603X
- DOI: 10.1145/637235.637239
Abstract
This paper describes a method of inferring logical relationships between network prefixes within an Autonomous System (AS) using only passive monitoring of BGP messages. By clustering these prefixes based upon similarities between their update times, we create a hierarchy linking the prefixes within the larger AS. We can frequently identify groups of prefixes routed to the same ISP Point of Presence (POP), despite the lack of identifying information in the BGP messages. Similarly, we observe disparate prefixes under common organizational control, or with long shared network paths. In addition to discovering interesting network characteristics, our passive method facilitates topology discovery by potentially reducing the number of active probes required in traditional traceroute-based Internet mapping mechanisms.
Topology inference from BGP routing dynamics
p1 updates
p2 updates
u
u
I seconds
1
1
1
1
10
0 0
(t)
p1
(t)
p2
Figure 2. Generating the update vectors u(t)
p1
and u(t)
p2
from a
stream of BGP updates for prefixes (p1; p2). Time is dis-
cretized into I-second bins. u(t)
p
= 1 if and only if at least
one BGP update for that prefix is received in the time win-
dow t.
A. Preprocessing
Before clustering, we group and filter the BGP updates. We
first group BGP updates into time intervals of I seconds. We fil-
ter out massive updates that were typically caused by BGP ses-
sion resets near our monitoring host. The information contained
in massive updates is very limited—it reflects the measuring net-
work’s topology and immediate upstream connections, not the
topology of any connections deeper in the network. Because
clustering scales with the number of items that have connections
to each other, this filtering can greatly increase the speed of the
cluster computation.
The output of the preprocessing stage is a series of update
groups, each of which contains the set of prefixes updated at
that time.
B. Clustering
Clustering requires two components. The distance metric is
a function that determines how closely two items are related.
The clustering method groups items based upon their distance.
Our primary distance metric (actually a nearness metric) is
based upon the correlation between the two update streams. To
compute this, we take the discretized update groups, and deter-
mine the update vector u(t)
p
for each prefix p:
u
(t)
p
=
1 if p updated during interval t
0 otherwise (1)
Figure 2 gives an example of update vector creation.
The correlation coefficient distance metric between two pre-
fixes (p
1
; p
2
) is the correlation coefficient between their corre-
sponding one/zero update vectors across n time intervals:
C(p1; p2) =
1
n
P
n
t=1
u
(t)
p1
u
p1
u
(t)
p2
u
p2
q
2
p1
q
2
p2
(2)
u
p1
is the average of u
p1
, that is, the probability that the prefix
will be updated in any given interval. 2
p
is its variance.
We also examined a second metric, the weighted sum of the
number of times that two prefixes (p
1
; p
2
) occur in the same
time interval. Each interval is normalized by the total number of
updates observed in order to reduce the significance of p
1
and
p
2
updates coinciding in a period of increased update traffic.
E−A: 84
A B C D E
Resulting ClusterInput Distances
A−B: 1
A−C: 2
...
B−C: 5
D−E: 25
Figure 3. An example of the single-linkage (pairwise greedy)
clustering method. The list on the left shows the 5 most
closely related items, in order, and the graph on the right
shows the clustering that results from this ordering.
Formally, let jtj be the number of prefixes updated in time bin t.
The weighted sum is:
S(p1; p2) =
n
X
t=1
(
1
jtj
if u(t)
p1
= u
(t)
p1
= 1
0 otherwise
(3)
We found the correlation metric superior to the weighted sum
metric in almost all cases, and focus on it during our evalua-
tion. The weighted sum normalizes against increased update
traffic, but admits a small bias towards groups of prefixes that
experience frequent updates. In contrast, correlation normal-
izes against update frequency, but does not incorporate the con-
fidence of the correlation. In future work, we intend to address
this by adding a small prior to the correlation computation—
a bias towards believing that prefixes are unrelated, so that the
confidence in the correlation plays a part as well.
Single-linkage clustering is a simple and efficient pairwise
greedy clustering method [6]. This method first computes the
O(n
2
) pairwise distances between objects and stores them in
sorted order. It iterates through the prefix pairs from closest to
farthest. When it encounters a new node in a pair, the single-
linkage method joins the node to its neighbor, or its neighbor’s
cluster if the neighbor has already been clustered. If both pre-
fixes are in the same cluster, it does nothing. If the prefixes are in
different clusters, the two are merged. This process is illustrated
in Figure 3.
This straightforward clustering requires no recomputation of
the O(n2) pairwise distances between objects, so the algorithm
runs in O(n2 logn) time. While single-linkage can be prone
to artifacts like long unbranched chains, we require a fast al-
gorithm to handle the large number of objects that we cluster
(1,000–110,000). Using this method, it is feasible, if time con-
suming, to cluster all 110,000 prefixes in the BGP routing tables.
Clustering 2,000 nodes runs in about 3 minutes.
III. DATA COLLECTION
We collected a large set of BGP traffic on which to perform
our analysis, and a snapshot of traceroutes with which to eval-
uate our clustering. In this section, we describe how and when
we collected routing traffic and traceroute data.
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime



