Centrality scaling in large networks.
Physical Review Letters (2010)
Available from arxiv.org
or
Abstract
Betweenness centrality lies at the core of both transport and structural vulnerability properties of complex networks, however, it is computationally costly, and its measurement for networks with millions of nodes is near impossible. By introducing a multiscale decomposition of shortest paths, we show that the contributions to betweenness coming from geodesics not longer than L obey a characteristic scaling vs L, which can be used to predict the distribution of the full centralities. The method is also illustrated on a real-world social network of 5.510 6 nodes and 2.710 7 links.
Available from arxiv.org
Page 1
Centrality scaling in large networks.
Centrality scaling in large networks
Maria Ercsey-Ravasz and Zoltan Toroczkaiy
Interdisciplinary Center for Network Science and Applications (iCeNSA), Department of Physics,
University of Notre Dame, Notre Dame, IN, 46556 USA
(Dated: July 28, 2010)
Betweenness centrality lies at the core of both transport and structural vulnerability properties
of complex networks, however, it is computationally costly, and its measurement for networks with
millions of nodes is nearly impossible. By introducing a multiscale decomposition of shortest paths,
we show that the contributions to betweenness coming from geodesics not longer than L obey a
characteristic scaling vs L, which can be used to predict the distribution of the full centralities. The
method is also illustrated on a real-world social network of 5:5 106 nodes and 2:7 107 links.
PACS numbers: 89.75.Hc, 89.65.-s, 02.10.Ox
Many complex networks are organically evolving with-
out any centralized control or design, and for this reason
intense research has been devoted to understand their
performance properties and more importantly, their vul-
nerabilities and failure modes. In these studies, a fun-
damental role is played by centrality measures (origi-
nally introduced in social sciences [1{5]), and in partic-
ular betweenness centrality [6{9]. Betweenness central-
ity (BC) of a node (edge) is dened as the fraction of
all geodesics (shortest paths) passing through that node
(edge). Since transport tends to minimize the cost/time
of the route from source to destination, geodesics, and
hence centrality measures and their distributions will
strongly determine overall transport performance. In-
terestingly, geodesics are not only important for network
ows but also for structural connectivity: removing nodes
(edges) with high centrality one obtains a rapid increase
in diameter, and eventually the structural breakup of the
graph. Analysis of trac, or information
ow [7, 9{14],
network vulnerability in face of attacks [15], cascading
failures [16, 17] or epidemics [18], all involve betweenness
calculations.
Unfortunately, computation of betweenness is very
costly [13, 14, 19{22] and for large networks with millions
to billions of nodes it is near impossible, hence approx-
imation methods are needed. Existing approximations
[23, 24], however, are sampling based, and ill controlled.
Here we show that when geodesics are restricted to
a maximum length L, the corresponding range-limited
L-betweenness (introduced by Borgatti and Everett as
bounded-distance betweenness [5]) for large graphs as-
sumes a characteristic scaling form as function of L.
This scaling can then be used to predict the between-
ness distribution in the (usually unattainable) diame-
ter limit, and with good approximation, to predict the
ranking of nodes/edges by betweenness. Additionally,
the range-limited method generates l-betweenness values
for all nodes and edges and for all 1 l L, provid-
ing systematic information on geodesics on all length-
scales. This is of interest in its own right, when the
transported entity has a small transmission probability
(rumors, viruses) and thus high attrition rate, not ex-
ploring longer geodesics. As we show, the L-betweenness
scaling is already achieved for relatively small L values
and there is increasingly less new information obtained
on BC distribution and ranking when going from L to
L+1. The computational overhead, however, involved in
the L 7! L+1 step is usually immense. The range-limited
centrality algorithm presented here, even in the diame-
ter limit (L = D), has no larger complexity than the
currently known fastest algorithms by Brandes [19] and
Newman [20], that is O(NM), where N is the number of
nodes and M is the number of (directed) edges, and it is
fully parallelizable. For L < D our algorithm runs sub-
linearly in O(NM), making it possible to study networks
with millions of nodes. As an illustration, we analyzed
a social network (SocNet) inferred from mobile phone
trace-logs [25] having a giant cluster with N = 5; 568; 785
and M = 26; 822; 764. For this network we calculated
all L-betweenness centralities (L-BCs) for all nodes and
edges up to L = 5 in 6 days, on 10 processors. With
increasing L the ranking of the highest BC nodes freezes
and one can predict the top nodes early. The number
of geodesics running through these nodes, however, ex-
plodes with L. For example, while the node with highest
centrality for L = 4 has 40; 084; 702 geodesics, for L = 5
it has 500; 903; 498 of them passing through.
Calculating betweenness centrality of a node or edge
in a directed graph G(V;E) requires to count the num-
ber of all-pair shortest directed paths incident on it.
Here we include end-points, however, the algorithm
can easily be changed to exclude them, or produce
other variants. The stress centrality (SC) S(i) of a
node i 2 V is simply the sum of the total number
mn(i) of shortest directed paths from node m to n
going through i, S(i) =
P
m;n2V mn(i). Between-
ness centrality (BC) [6, 8] normalizes the number of
paths through a node by the total number of paths
(mn) for a given source-destination pair (m;n): B(i) =P
m;n2V mn(i)=mn. Similar quantities can be dened
for an edge (j; k) 2 E: S(j; k) =
P
m;n2V mn(j; k) and
B(j; k) =
P
m;n2V mn(j; k)=mn.
ar
X
iv
:1
00
3.
06
92
v2
[
ph
ys
ics
.so
c-p
h]
2
7 J
ul
20
10
Maria Ercsey-Ravasz and Zoltan Toroczkaiy
Interdisciplinary Center for Network Science and Applications (iCeNSA), Department of Physics,
University of Notre Dame, Notre Dame, IN, 46556 USA
(Dated: July 28, 2010)
Betweenness centrality lies at the core of both transport and structural vulnerability properties
of complex networks, however, it is computationally costly, and its measurement for networks with
millions of nodes is nearly impossible. By introducing a multiscale decomposition of shortest paths,
we show that the contributions to betweenness coming from geodesics not longer than L obey a
characteristic scaling vs L, which can be used to predict the distribution of the full centralities. The
method is also illustrated on a real-world social network of 5:5 106 nodes and 2:7 107 links.
PACS numbers: 89.75.Hc, 89.65.-s, 02.10.Ox
Many complex networks are organically evolving with-
out any centralized control or design, and for this reason
intense research has been devoted to understand their
performance properties and more importantly, their vul-
nerabilities and failure modes. In these studies, a fun-
damental role is played by centrality measures (origi-
nally introduced in social sciences [1{5]), and in partic-
ular betweenness centrality [6{9]. Betweenness central-
ity (BC) of a node (edge) is dened as the fraction of
all geodesics (shortest paths) passing through that node
(edge). Since transport tends to minimize the cost/time
of the route from source to destination, geodesics, and
hence centrality measures and their distributions will
strongly determine overall transport performance. In-
terestingly, geodesics are not only important for network
ows but also for structural connectivity: removing nodes
(edges) with high centrality one obtains a rapid increase
in diameter, and eventually the structural breakup of the
graph. Analysis of trac, or information
ow [7, 9{14],
network vulnerability in face of attacks [15], cascading
failures [16, 17] or epidemics [18], all involve betweenness
calculations.
Unfortunately, computation of betweenness is very
costly [13, 14, 19{22] and for large networks with millions
to billions of nodes it is near impossible, hence approx-
imation methods are needed. Existing approximations
[23, 24], however, are sampling based, and ill controlled.
Here we show that when geodesics are restricted to
a maximum length L, the corresponding range-limited
L-betweenness (introduced by Borgatti and Everett as
bounded-distance betweenness [5]) for large graphs as-
sumes a characteristic scaling form as function of L.
This scaling can then be used to predict the between-
ness distribution in the (usually unattainable) diame-
ter limit, and with good approximation, to predict the
ranking of nodes/edges by betweenness. Additionally,
the range-limited method generates l-betweenness values
for all nodes and edges and for all 1 l L, provid-
ing systematic information on geodesics on all length-
scales. This is of interest in its own right, when the
transported entity has a small transmission probability
(rumors, viruses) and thus high attrition rate, not ex-
ploring longer geodesics. As we show, the L-betweenness
scaling is already achieved for relatively small L values
and there is increasingly less new information obtained
on BC distribution and ranking when going from L to
L+1. The computational overhead, however, involved in
the L 7! L+1 step is usually immense. The range-limited
centrality algorithm presented here, even in the diame-
ter limit (L = D), has no larger complexity than the
currently known fastest algorithms by Brandes [19] and
Newman [20], that is O(NM), where N is the number of
nodes and M is the number of (directed) edges, and it is
fully parallelizable. For L < D our algorithm runs sub-
linearly in O(NM), making it possible to study networks
with millions of nodes. As an illustration, we analyzed
a social network (SocNet) inferred from mobile phone
trace-logs [25] having a giant cluster with N = 5; 568; 785
and M = 26; 822; 764. For this network we calculated
all L-betweenness centralities (L-BCs) for all nodes and
edges up to L = 5 in 6 days, on 10 processors. With
increasing L the ranking of the highest BC nodes freezes
and one can predict the top nodes early. The number
of geodesics running through these nodes, however, ex-
plodes with L. For example, while the node with highest
centrality for L = 4 has 40; 084; 702 geodesics, for L = 5
it has 500; 903; 498 of them passing through.
Calculating betweenness centrality of a node or edge
in a directed graph G(V;E) requires to count the num-
ber of all-pair shortest directed paths incident on it.
Here we include end-points, however, the algorithm
can easily be changed to exclude them, or produce
other variants. The stress centrality (SC) S(i) of a
node i 2 V is simply the sum of the total number
mn(i) of shortest directed paths from node m to n
going through i, S(i) =
P
m;n2V mn(i). Between-
ness centrality (BC) [6, 8] normalizes the number of
paths through a node by the total number of paths
(mn) for a given source-destination pair (m;n): B(i) =P
m;n2V mn(i)=mn. Similar quantities can be dened
for an edge (j; k) 2 E: S(j; k) =
P
m;n2V mn(j; k) and
B(j; k) =
P
m;n2V mn(j; k)=mn.
ar
X
iv
:1
00
3.
06
92
v2
[
ph
ys
ics
.so
c-p
h]
2
7 J
ul
20
10
Page 2
2(a) (b)
(c)
G
l 1
G
l
i
j
k
i
G
r
G
l
i
j
k
G
r+1
n
m
m
s
r
r
(i | j) =
ij
G
0
G
0
FIG. 1: a) Shells of the C3 subgraph of node i (black) are
colored red, blue, green. Grey elements are not part of
the subgraph. b) Eq. (1) calculates SC of a node in Gl
(blue) by summing the SC of all its predecessors from Gl 1(i)
(red), e.g., sll(ijj) = s
l 1
l 1(ijk) + s
l 1
l 1(ijm). c) Eqs.(2),(3)
are based on the observations: in(j; k) = srr(ijj)kn and
in(k) = sr+1r+1(ijk)kn. Eq.(4) calculates the xed-l central-
ities for a node (red) in Gr(i) by summing the correspond-
ing centralities of its outgoing links (blue) in Gr+1(i), e.g.,
srl (ijj) = s
r+1
l (ijj; k) + s
r+1
l (ijj;m).
In order to dene range-limited quantities, let sl(j) and
bl(j) denote the stress and betweenness centralities of a
node j for all-pair shortest directed paths of xed length
l. Then SL(j) =
PL
l=1 sl(j) and BL(j) =
PL
l=1 bl(j) rep-
resent centralities from paths not longer than L. Similar
measures for an edge are dened in the same way. Just as
virtually all centrality algorithms, our method calculates
these quantities for a node j for shortest directed paths
all emanating from a \root" node i, then it sums the ob-
tained values for all i 2 V to get the nal centralities for
j (similarly for edges). While the basic concept of our
algorithm is similar to Brandes' [19] and Newman's [20],
we derive recursions that simultaneously compute both
SC and BC for all nodes and edges and for all values
l = 1; : : : ; L. The algorithm's output thus generates de-
tailed and systematic information about shortest paths
in a graph on all length-scales, providing a tool for mul-
tiscale network analysis.
The algorithm starts from a given root i and builds
the L-range subgraph CL containing all nodes which can
be reached in at most L steps from i. Only links which
are part of the shortest paths starting from the root are
included in CL. We decompose CL into shells Gl(i) con-
taining all the nodes at shortest path distance l from the
root, and all incoming edges from shell l 1, Fig. 1a).
The root itself is considered to be shell 0 (G0(i)).
Let srl (ijj) =
P
n2Gl
in(j) denote the number of
shortest directed paths of length l from the root through
node j in the r-th shell j 2 Gr(i), and let srl (ijj; k) =P
n2Gl
in(j; k) describe the same quantity for an edge
(j; k) in the r-th shell, (j; k) 2 Gr(i). We de-
ne similar quantities for betweenness, as brl (ijj) =P
n2Gl
in(j)=in, and brl (ijj; k) =
P
n2Gl
in(j; k)=in.
Then sl(j) =
P
i2V s
r
l (ijj) and bl(j) =
P
i2V b
r
l (ijj),
with similar equations for edges. In these sums r is not
an independent variable. Given i and j, it is the radius
of shell Gr(i) centered on i and containing j. One can
show that the following recursions hold, (see also Fig. 1):
sll(ijj) =
P
ks
l 1
l 1(ijk) ; b
l
l(ijj) = 1; (1)
sr+1l (ijj; k) = s
r+1
l (ijk)s
r
r(ijj)=s
r+1
r+1(ijk); (2)
br+1l (ijj; k) = b
r+1
l (ijk)s
r
r(ijj)=s
r+1
r+1(ijk); (3)
srl (ijj) =
P
ks
r+1
l (ijj; k); b
r
l (ijj) =
P
kb
r+1
l (ijj; k):(4)
The steps below are repeated for l = 1; : : : ; L: 1) Build
Gl(i), using breadth-rst search. 2) Calculate the l-
centrality measures (sll(ijj), b
l
l(ijj)) of all nodes in Gl(i).
3) Moving backwards, through r = l 1; :::; 1; 0, calculate
the xed-l centralities of links in Gr+1(i) and of nodes in
Gr(i), using recursions (1-4). Finally, return to step 1)
until the last shell GL(i) is reached. In the end, we ob-
tained the xed-l betweenness values of all nodes and
edges in CL. This concludes the basic algorithm, which
can be modied to compute dierent variants of BC and
SC, such as excluding endpoints. Similar recursions can
also be derived for load and closeness centrality [7, 21].
The L-betweenness values on large networks obey a
scaling behavior as function of L. On Fig. 2 we plot
the distribution of node betweenness values measured on
the Erd}os-Renyi (ER) random graph [26], the Barabasi-
Albert (BA) scale-free model [27], the random geometric
graph (RG) [28] and the large social network (SocNet)
[25]. Since in large networks BL grows quickly, it is better
to work with the distribution QL of the lnBL values than
with the distribution PL of BL values. However, note
that QL(lnB) = BPL(B). As shown on the insets of
Fig. 2, the distributions QL(lnB) for dierent L can
be rescaled onto each other by plotting Q = LQL vs
u = [ln(B) L]=L, where L and L are the mean
and the standard deviation for lnBL. These networks
were chosen to represent very dierent graph classes: the
ER, BA and SocNet have small diameters, while the RG
has no shortcuts. The RG is spatially embedded (d = 2)
unlike ER and BA; the SocNet, however, is in
uenced
by the spatial embedding of people's motility [25]. While
BA has a power-law degree distribution P (k) k 3,
both ER and RG have a Poissonian for P (k), and the
SocNet's P (k) resembles a log-normal [31, 32]. Both RG
and SocNet have high clustering, unlike the others.
Next we show that the scaling behavior observed for
range-limited centralities in large graphs is a consequence
of the scaling for shell sizes shown to exist for e.g.,
in random graphs with arbitrary degree distributions
[29, 30]. Here we present arguments for undirected,
uncorrelated graphs and only deal with BC, extensions
to other centralities mentioned above being straightfor-
ward. Let us dene hi as an average over all root nodes
(c)
G
l 1
G
l
i
j
k
i
G
r
G
l
i
j
k
G
r+1
n
m
m
s
r
r
(i | j) =
ij
G
0
G
0
FIG. 1: a) Shells of the C3 subgraph of node i (black) are
colored red, blue, green. Grey elements are not part of
the subgraph. b) Eq. (1) calculates SC of a node in Gl
(blue) by summing the SC of all its predecessors from Gl 1(i)
(red), e.g., sll(ijj) = s
l 1
l 1(ijk) + s
l 1
l 1(ijm). c) Eqs.(2),(3)
are based on the observations: in(j; k) = srr(ijj)kn and
in(k) = sr+1r+1(ijk)kn. Eq.(4) calculates the xed-l central-
ities for a node (red) in Gr(i) by summing the correspond-
ing centralities of its outgoing links (blue) in Gr+1(i), e.g.,
srl (ijj) = s
r+1
l (ijj; k) + s
r+1
l (ijj;m).
In order to dene range-limited quantities, let sl(j) and
bl(j) denote the stress and betweenness centralities of a
node j for all-pair shortest directed paths of xed length
l. Then SL(j) =
PL
l=1 sl(j) and BL(j) =
PL
l=1 bl(j) rep-
resent centralities from paths not longer than L. Similar
measures for an edge are dened in the same way. Just as
virtually all centrality algorithms, our method calculates
these quantities for a node j for shortest directed paths
all emanating from a \root" node i, then it sums the ob-
tained values for all i 2 V to get the nal centralities for
j (similarly for edges). While the basic concept of our
algorithm is similar to Brandes' [19] and Newman's [20],
we derive recursions that simultaneously compute both
SC and BC for all nodes and edges and for all values
l = 1; : : : ; L. The algorithm's output thus generates de-
tailed and systematic information about shortest paths
in a graph on all length-scales, providing a tool for mul-
tiscale network analysis.
The algorithm starts from a given root i and builds
the L-range subgraph CL containing all nodes which can
be reached in at most L steps from i. Only links which
are part of the shortest paths starting from the root are
included in CL. We decompose CL into shells Gl(i) con-
taining all the nodes at shortest path distance l from the
root, and all incoming edges from shell l 1, Fig. 1a).
The root itself is considered to be shell 0 (G0(i)).
Let srl (ijj) =
P
n2Gl
in(j) denote the number of
shortest directed paths of length l from the root through
node j in the r-th shell j 2 Gr(i), and let srl (ijj; k) =P
n2Gl
in(j; k) describe the same quantity for an edge
(j; k) in the r-th shell, (j; k) 2 Gr(i). We de-
ne similar quantities for betweenness, as brl (ijj) =P
n2Gl
in(j)=in, and brl (ijj; k) =
P
n2Gl
in(j; k)=in.
Then sl(j) =
P
i2V s
r
l (ijj) and bl(j) =
P
i2V b
r
l (ijj),
with similar equations for edges. In these sums r is not
an independent variable. Given i and j, it is the radius
of shell Gr(i) centered on i and containing j. One can
show that the following recursions hold, (see also Fig. 1):
sll(ijj) =
P
ks
l 1
l 1(ijk) ; b
l
l(ijj) = 1; (1)
sr+1l (ijj; k) = s
r+1
l (ijk)s
r
r(ijj)=s
r+1
r+1(ijk); (2)
br+1l (ijj; k) = b
r+1
l (ijk)s
r
r(ijj)=s
r+1
r+1(ijk); (3)
srl (ijj) =
P
ks
r+1
l (ijj; k); b
r
l (ijj) =
P
kb
r+1
l (ijj; k):(4)
The steps below are repeated for l = 1; : : : ; L: 1) Build
Gl(i), using breadth-rst search. 2) Calculate the l-
centrality measures (sll(ijj), b
l
l(ijj)) of all nodes in Gl(i).
3) Moving backwards, through r = l 1; :::; 1; 0, calculate
the xed-l centralities of links in Gr+1(i) and of nodes in
Gr(i), using recursions (1-4). Finally, return to step 1)
until the last shell GL(i) is reached. In the end, we ob-
tained the xed-l betweenness values of all nodes and
edges in CL. This concludes the basic algorithm, which
can be modied to compute dierent variants of BC and
SC, such as excluding endpoints. Similar recursions can
also be derived for load and closeness centrality [7, 21].
The L-betweenness values on large networks obey a
scaling behavior as function of L. On Fig. 2 we plot
the distribution of node betweenness values measured on
the Erd}os-Renyi (ER) random graph [26], the Barabasi-
Albert (BA) scale-free model [27], the random geometric
graph (RG) [28] and the large social network (SocNet)
[25]. Since in large networks BL grows quickly, it is better
to work with the distribution QL of the lnBL values than
with the distribution PL of BL values. However, note
that QL(lnB) = BPL(B). As shown on the insets of
Fig. 2, the distributions QL(lnB) for dierent L can
be rescaled onto each other by plotting Q = LQL vs
u = [ln(B) L]=L, where L and L are the mean
and the standard deviation for lnBL. These networks
were chosen to represent very dierent graph classes: the
ER, BA and SocNet have small diameters, while the RG
has no shortcuts. The RG is spatially embedded (d = 2)
unlike ER and BA; the SocNet, however, is in
uenced
by the spatial embedding of people's motility [25]. While
BA has a power-law degree distribution P (k) k 3,
both ER and RG have a Poissonian for P (k), and the
SocNet's P (k) resembles a log-normal [31, 32]. Both RG
and SocNet have high clustering, unlike the others.
Next we show that the scaling behavior observed for
range-limited centralities in large graphs is a consequence
of the scaling for shell sizes shown to exist for e.g.,
in random graphs with arbitrary degree distributions
[29, 30]. Here we present arguments for undirected,
uncorrelated graphs and only deal with BC, extensions
to other centralities mentioned above being straightfor-
ward. Let us dene hi as an average over all root nodes
Page 3
30 5 10 15 20
ln(B)
1e-4
1e-2
1
1e+2
Q
L
L=1
L=2
L=3
L=4
-2
-1
5 10 15
ln(B)
0
0.5
1
1.5
2
2.5
Q
L
L=5
L=6
L=10
L=20
L=30
L=55
L=79
0 5 10 15 20
ln(B)
0
0.2
0.4
Q
L
L=1
L=2
L=3
L=4
L=5
0 5 10 15
ln(B)
0
0.5
1
Q
L
L=4
L=5
L=6
L=7
L=8
L=10
L=16
!"# $%#
&'()*+#
(a) (b)
(c) (d)
",#
0 5 10 15
ln( )
0
0.4
0.8
1.2
L=1
L=2
L=3
5 10 15
ln( )
0
0.5
1
1.5
2
2.5 L=1
L=2
L=3
L=4
0 5 10
ln(
1e-4
1e-2
1e0
1e+2
L=5
L=6
L=8
-4 0 2
u
0.1
0.4
Q
-2 2 4
u
1
Q
-1 1 2
u
0
1
Q
-5 0
u
0.1
0.4
Q
FIG. 2: Distribution QL of L-betweeness for dierent values
of L. a) ER, N = 5 104, hki = 4, diameter D = 16, b) BA,
N = 5 104, m = 3, D = 8, c) SocNet, N = 5; 568; 785, M =
26; 822; 764 and the distributions are tted by a lognormal
(black dashed curves), d) RG, N = 104, hki = 15, D = 79.
The insets show the rescaled distributions, see text.
i in the graph. If zl(i) denotes the number of nodes
on shell Gl(i), then we model the growth of shell sizes
by a branching-like process zl+1(i) = zl(i)l
1 + l(i)
,
where l = hzl+1i=hzli is the branching factor at an l-
th shell, and l(i) is a per-node, shell occupancy noise
term, jlj 1, considered to obey hl(i)i = 0 and
hl(i)m(i)i = 2All;m, with Al decreasing with l, sup-
ported by numerical evidence. For undirected paths we
can write bl+1(j) = (1=2)
P
i2V bl+1(ijj) = zl+1(j) +
(1=2)
Pl
m=1
P
i2Gm(j)
bml+1(ijj) zl+1(j) + (1=2)ul+1(j),
where we used the fact that in undirected graphs i 2
Gm(j) , j 2 Gm(i). Note that the number of terms
in the inner sum
P
i2Gm(j)
bml+1(ijj) is zm(j), which
is rapidly increasing with m, and thus ul+1(j) is ex-
pected to have a weak dependence on j. Accordingly,
we may approximate ul+1(j) '
Pl
m=1
P
i2Gm(j)
vml+1(i),
where vml+1(i) is an average betweenness computed on
a shell of radius m, centered on node i : vml+1(i) =P
k2Gm(i)
bml+1(ijk)
=zm(i). Based on the observa-
tion that
P
k2Gm(i)
bml (ijk) = zl(i), we can write that
vml+1(i) ' zl+1(i)=zm(i). Using the recursion dened
above for zl+1(i) as a branching process, and neglect-
ing the small noise term, we obtain that ul+1(j) '
l
Pl
m=1
P
i2Gm(j)
zl(i)=zm(i). This allows us to write
a recursion for bl+1(j) as bl+1(j) ' l[bl(j) + zl(j)=2 +
zl(j)l(j)], which can be iterated down to l = 1, where
b1(j) = z1(j) = kj is the degree of j:
bl(j) ' lkje
l(j) ; (5)
with l = l+12
Ql 1
m=1 m =
l+1
2 hzli=hki, and l(j) =Pl 1
n=1
l+1 n
l+1 n(j). Eq (5) allows to relate the statistics of
1 10 100
l
1e+2
1e+6
b
l
(
j
)
,
B
l
(
j
)
b
l
(j)
B
l
(j)
3
2
0 5 10
l
1
10
1e+6
b
l
(
j
)
,
B
l
(
j
)
b
l
(j)
B
l
(j)
0 50 100 150 200
L
1
10
100
1000
r
a
n
k
i
n
g
1 2 3 4 5
L
0
10
20
30
40
50
60
r
a
n
k
i
n
g
(e) (f)
(c) (d)
(a) (b)
!"#$%&' ()'
()'!"#$%&' *('
!"#$%&' ()'
+,-./'0,123'4567'8659:5;'<&6=:7:>%?'
0,@A3'<%&'"B'&"C'-2':<'"=&6:5%?'
0 5 10
l
1
10
1e+6
b
l
(
j
)
,
B
l
(
j
)
b
l
(j)
B
l
(j)
<%&'"B'&"C'-2':<'"=&6:5%?'
1 10 100
l
1e+2
1e+6
b
l
(
j
)
,
B
l
(
j
)
b
l
(i)
B
l
(i)
3
2
FIG. 3: a) bl (circles) and Bl (stars) vs. l for some node j
in SocNet (red) and ER (blue). b) same as a) for RG for two
arbitrary nodes i and j. BL+1 vs. BL for c) SocNet and d)
RG. Each dot corresponds to a node. Ranking by BC vs L
for the top 10 nodes in e) SocNet and f) RG (from Fig 4).
xed-l betweenness to the statistics of shell occupancies.
Since the noise term (calculated from per-node occupancy
deviations on a shell) is independent on root degree, the
distribution of xed-l betweenness can be expressed as:
l(b) =
1
b
Z N 1
1
dk P (k)l(ln b lnl ln k) ; (6)
where P (k) is the degree distribution and l() is the
distribution for the noise l(j), peaked at = 0, with fast
decaying tails and 1(x) = (x). From (6) follows that
the natural scaling variable for betweenness distribution
is u = ln b lnl. An extra l-dependence comes from the
noise through the width l of l (for l > 1), which can be
easily accounted for by the rescaling u 7! u=l, l 7! ll,
collapsing the distributions for dierent l-values onto the
same functional form. As l is sharply peaked around 0,
the most signicant contribution to the integral (6) for
a given b comes from degrees k ' b=l. Since k 1,
we have a rapid decay of l(b) in the range b < l, a
maximum at b = lk where k is the degree at which
P (k) is maximum, and a sharp decay for b > (N 1)l.
In many networks, shell-size grows exponentially (ER,
AB, and also in the SocNet), that is l ' = hz2i=hki,
until l reaches the average shortest path distance. This
implies that l l and bl grows exponentially with l
(Fig. 3a). In this case, since bl is rapidly increasing with
l, the cumulative BL(j) =
P
l bl(j) will be dominated by
ln(B)
1e-4
1e-2
1
1e+2
Q
L
L=1
L=2
L=3
L=4
-2
-1
5 10 15
ln(B)
0
0.5
1
1.5
2
2.5
Q
L
L=5
L=6
L=10
L=20
L=30
L=55
L=79
0 5 10 15 20
ln(B)
0
0.2
0.4
Q
L
L=1
L=2
L=3
L=4
L=5
0 5 10 15
ln(B)
0
0.5
1
Q
L
L=4
L=5
L=6
L=7
L=8
L=10
L=16
!"# $%#
&'()*+#
(a) (b)
(c) (d)
",#
0 5 10 15
ln( )
0
0.4
0.8
1.2
L=1
L=2
L=3
5 10 15
ln( )
0
0.5
1
1.5
2
2.5 L=1
L=2
L=3
L=4
0 5 10
ln(
1e-4
1e-2
1e0
1e+2
L=5
L=6
L=8
-4 0 2
u
0.1
0.4
Q
-2 2 4
u
1
Q
-1 1 2
u
0
1
Q
-5 0
u
0.1
0.4
Q
FIG. 2: Distribution QL of L-betweeness for dierent values
of L. a) ER, N = 5 104, hki = 4, diameter D = 16, b) BA,
N = 5 104, m = 3, D = 8, c) SocNet, N = 5; 568; 785, M =
26; 822; 764 and the distributions are tted by a lognormal
(black dashed curves), d) RG, N = 104, hki = 15, D = 79.
The insets show the rescaled distributions, see text.
i in the graph. If zl(i) denotes the number of nodes
on shell Gl(i), then we model the growth of shell sizes
by a branching-like process zl+1(i) = zl(i)l
1 + l(i)
,
where l = hzl+1i=hzli is the branching factor at an l-
th shell, and l(i) is a per-node, shell occupancy noise
term, jlj 1, considered to obey hl(i)i = 0 and
hl(i)m(i)i = 2All;m, with Al decreasing with l, sup-
ported by numerical evidence. For undirected paths we
can write bl+1(j) = (1=2)
P
i2V bl+1(ijj) = zl+1(j) +
(1=2)
Pl
m=1
P
i2Gm(j)
bml+1(ijj) zl+1(j) + (1=2)ul+1(j),
where we used the fact that in undirected graphs i 2
Gm(j) , j 2 Gm(i). Note that the number of terms
in the inner sum
P
i2Gm(j)
bml+1(ijj) is zm(j), which
is rapidly increasing with m, and thus ul+1(j) is ex-
pected to have a weak dependence on j. Accordingly,
we may approximate ul+1(j) '
Pl
m=1
P
i2Gm(j)
vml+1(i),
where vml+1(i) is an average betweenness computed on
a shell of radius m, centered on node i : vml+1(i) =P
k2Gm(i)
bml+1(ijk)
=zm(i). Based on the observa-
tion that
P
k2Gm(i)
bml (ijk) = zl(i), we can write that
vml+1(i) ' zl+1(i)=zm(i). Using the recursion dened
above for zl+1(i) as a branching process, and neglect-
ing the small noise term, we obtain that ul+1(j) '
l
Pl
m=1
P
i2Gm(j)
zl(i)=zm(i). This allows us to write
a recursion for bl+1(j) as bl+1(j) ' l[bl(j) + zl(j)=2 +
zl(j)l(j)], which can be iterated down to l = 1, where
b1(j) = z1(j) = kj is the degree of j:
bl(j) ' lkje
l(j) ; (5)
with l = l+12
Ql 1
m=1 m =
l+1
2 hzli=hki, and l(j) =Pl 1
n=1
l+1 n
l+1 n(j). Eq (5) allows to relate the statistics of
1 10 100
l
1e+2
1e+6
b
l
(
j
)
,
B
l
(
j
)
b
l
(j)
B
l
(j)
3
2
0 5 10
l
1
10
1e+6
b
l
(
j
)
,
B
l
(
j
)
b
l
(j)
B
l
(j)
0 50 100 150 200
L
1
10
100
1000
r
a
n
k
i
n
g
1 2 3 4 5
L
0
10
20
30
40
50
60
r
a
n
k
i
n
g
(e) (f)
(c) (d)
(a) (b)
!"#$%&' ()'
()'!"#$%&' *('
!"#$%&' ()'
+,-./'0,123'4567'8659:5;'<&6=:7:>%?'
0,@A3'<%&'"B'&"C'-2':<'"=&6:5%?'
0 5 10
l
1
10
1e+6
b
l
(
j
)
,
B
l
(
j
)
b
l
(j)
B
l
(j)
<%&'"B'&"C'-2':<'"=&6:5%?'
1 10 100
l
1e+2
1e+6
b
l
(
j
)
,
B
l
(
j
)
b
l
(i)
B
l
(i)
3
2
FIG. 3: a) bl (circles) and Bl (stars) vs. l for some node j
in SocNet (red) and ER (blue). b) same as a) for RG for two
arbitrary nodes i and j. BL+1 vs. BL for c) SocNet and d)
RG. Each dot corresponds to a node. Ranking by BC vs L
for the top 10 nodes in e) SocNet and f) RG (from Fig 4).
xed-l betweenness to the statistics of shell occupancies.
Since the noise term (calculated from per-node occupancy
deviations on a shell) is independent on root degree, the
distribution of xed-l betweenness can be expressed as:
l(b) =
1
b
Z N 1
1
dk P (k)l(ln b lnl ln k) ; (6)
where P (k) is the degree distribution and l() is the
distribution for the noise l(j), peaked at = 0, with fast
decaying tails and 1(x) = (x). From (6) follows that
the natural scaling variable for betweenness distribution
is u = ln b lnl. An extra l-dependence comes from the
noise through the width l of l (for l > 1), which can be
easily accounted for by the rescaling u 7! u=l, l 7! ll,
collapsing the distributions for dierent l-values onto the
same functional form. As l is sharply peaked around 0,
the most signicant contribution to the integral (6) for
a given b comes from degrees k ' b=l. Since k 1,
we have a rapid decay of l(b) in the range b < l, a
maximum at b = lk where k is the degree at which
P (k) is maximum, and a sharp decay for b > (N 1)l.
In many networks, shell-size grows exponentially (ER,
AB, and also in the SocNet), that is l ' = hz2i=hki,
until l reaches the average shortest path distance. This
implies that l l and bl grows exponentially with l
(Fig. 3a). In this case, since bl is rapidly increasing with
l, the cumulative BL(j) =
P
l bl(j) will be dominated by
Page 4
4(a)
(c)
(b)
(d)
FIG. 4: Vulnerability backbone in a RG graph (N = 5 103,
hki = 5) for a) L = 5, b) L = 15, c) L = 45, d) L = D = 195.
Darker red indicates nodes with higher BL. In agreement
with Fig.3f, VB is already well approximated at L = 45, c).
the largest l values and thus, BL obeys a similar scaling
supporting the observations in Fig. 2. For pure scale-free
networks P (k) = ck
, and l(b) / (b=l)1
for l > 1.
In networks where the shell size grows as a power law
(spatially embedded networks without shortcuts), such
as RG, roadways, etc., l ld, where d is the embedding
dimension, bl(j) ld and BL ld+1 (Fig.3.b).
As the contributions of the noise terms l(j) to l(j)
coming from larger shells are decreasing with increasing
l (their weight decreases as (l + 1) 1 in addition to the
decreasing of their magnitude jl(j)j) the l(j) quantities
rapidly converge to a constant. From (5), for a pair of
nodes i; j: ln[bl(i)=bl(j)] = ln(ki=kj) + l(i) l(j) show-
ing that their relative ranking by l-betweenness freezes
with increasing l. Consequently, BL and BL+1 become
more correlated with increasing L (Fig3c,d) and the rank-
ing of the nodes by their BC also freezes (Fig.3e,f), al-
lowing early prediction of top betweenness nodes. Spa-
tially embedded networks (RG) without shortcuts rep-
resent the worst case, but relative to their diameter the
convergence of ranking is still fast (Fig.3f). An important
application of top betweenness predictability is determin-
ing the \vulnerability backbone" (VB) of a graph (crucial
for network defense purposes [15, 18]) which is made by
the smallest fraction of highest betweenness nodes form-
ing a percolating cluster through the network. Fig.4 for
RG (worst case) shows that the VB (red subgraph) can
accurately be predicted already from L = 45 between-
ness values (Fig.4c) compared to the diameter (D = 195)
based full betwennesses (Fig.4d).
Finally, we note that the scaling behavior can be used
to provide a lower bound L of the diameter, from ob-
serving that nite size eects appear when the sum of av-
erage shell sizes hits N :
PL
l=1hzli =
PL
l=1
2
l+1lhki ' N .
This allows to nd L from the scaling behavior of l. In
particular, for the SocNet L = 10.
In summary, we have shown that the contributions to
centrality measures coming from dierent length scales
of the geodesics exhibit characteristic scaling in large
graphs. Exploiting this universal property with the
methods presented here makes it possible to predict be-
tweenness values, distributions and ranking with rela-
tively low computational costs.
This project was supported in part by the NSF BCS-
0826958, HDTRA 201473-35045 and by the Army Re-
search Laboratory, W911NF-09-2-0053. Views and con-
clusions are those of the authors, not representing those
of the ARL or U.S. Govt.
Electronic address: mercseyr@nd.edu
y Electronic address: toro@nd.edu
[1] S. Wasserman and K. Faust, Social Network Analysis:
methods and applications (Cambridge Univ. Press, 1994).
[2] J. Scott, Social Network Analysis: A Handbook (Sage
Publications, 1991).
[3] G. Sabidussi, Psychometrika 31, 581 (1966).
[4] N. E. Friedkin, Amer. J. of Soc. 96, 1478 (1991).
[5] S. P. Borgatti and M. G. Everett, Soc. Netw. 28, 466
(2006).
[6] L. C. Freeman, Sociometry 40, 35 (1977).
[7] S. P. Borgatti, Soc. Netw. 27, 55 (2005).
[8] J. M. Anthonisse, Tech. Rep. BN 9/71, Stichting Math.
Centr., Amsterdam (1971).
[9] S. Sreenivasan et al., Phys. Rev. E 75, 036105 (2007).
[10] L. Dall'Asta et al., Theor. Comp. Sci. 355, 6 (2006).
[11] L. Dall'Asta et al., Phys. Rev. E 71, 036135 (2005).
[12] K.-I. Goh et al., Phys. Rev. Lett. 87, 278701 (2001).
[13] B. Danila et al., Phys. Rev. E 74, 046114 (2006).
[14] R. Guimera et al., Phys. Rev. Lett. 89, 248701 (2001).
[15] P. Holme et al., Phys. Rev. E 65, 056109 (2002).
[16] A.E. Motter, Phys. Rev. Lett. 93, 098701 (2004).
[17] A. Vespignani, Science 325, 425 (2009).
[18] L. Dall'Asta et al., J.Stat.Mech., P04006, (2006).
[19] U. Brandes, J. of Math. Sociology 25, 163 (2001).
[20] M. E. J. Newman, Phys. Rev. E 64, 016132 (2001).
[21] U. Brandes, Soc. Netw. 30, 136 (2008).
[22] J.D. Noh and H. Rieger, Phys. Rev. Lett. 92, 118701
(2004).
[23] U. Brandes and C. Pich, I. J. Bif. Chaos 17, 2303 (2007).
[24] R. Geisberger et al., in ALENEX , 90 (2008).
[25] M. C. Gonzalez et al., Nature 453, 779 (2008).
[26] P. Erd}os and A. Renyi, Publ. Math. Inst. Hung. Acad.
Sci 5, 17 (1960).
[27] A. L. Barabasi and R. Albert, Science 286, 509 (1999).
[28] J. Dall and M. Christensen, Phys. Rev. E 66, 016121
(2002).
[29] M. E. J. Newman et al., Phys. Rev. E 64, 026118 (2001).
[30] J. Shao et al., Phys. Rev. E 80, 036105 (2009).
[31] J. P. Onnela et al., PNAS, 104, 7332 (2007).
[32] M. Seshadri et al., SIGKDD-08 (2008).
(c)
(b)
(d)
FIG. 4: Vulnerability backbone in a RG graph (N = 5 103,
hki = 5) for a) L = 5, b) L = 15, c) L = 45, d) L = D = 195.
Darker red indicates nodes with higher BL. In agreement
with Fig.3f, VB is already well approximated at L = 45, c).
the largest l values and thus, BL obeys a similar scaling
supporting the observations in Fig. 2. For pure scale-free
networks P (k) = ck
, and l(b) / (b=l)1
for l > 1.
In networks where the shell size grows as a power law
(spatially embedded networks without shortcuts), such
as RG, roadways, etc., l ld, where d is the embedding
dimension, bl(j) ld and BL ld+1 (Fig.3.b).
As the contributions of the noise terms l(j) to l(j)
coming from larger shells are decreasing with increasing
l (their weight decreases as (l + 1) 1 in addition to the
decreasing of their magnitude jl(j)j) the l(j) quantities
rapidly converge to a constant. From (5), for a pair of
nodes i; j: ln[bl(i)=bl(j)] = ln(ki=kj) + l(i) l(j) show-
ing that their relative ranking by l-betweenness freezes
with increasing l. Consequently, BL and BL+1 become
more correlated with increasing L (Fig3c,d) and the rank-
ing of the nodes by their BC also freezes (Fig.3e,f), al-
lowing early prediction of top betweenness nodes. Spa-
tially embedded networks (RG) without shortcuts rep-
resent the worst case, but relative to their diameter the
convergence of ranking is still fast (Fig.3f). An important
application of top betweenness predictability is determin-
ing the \vulnerability backbone" (VB) of a graph (crucial
for network defense purposes [15, 18]) which is made by
the smallest fraction of highest betweenness nodes form-
ing a percolating cluster through the network. Fig.4 for
RG (worst case) shows that the VB (red subgraph) can
accurately be predicted already from L = 45 between-
ness values (Fig.4c) compared to the diameter (D = 195)
based full betwennesses (Fig.4d).
Finally, we note that the scaling behavior can be used
to provide a lower bound L of the diameter, from ob-
serving that nite size eects appear when the sum of av-
erage shell sizes hits N :
PL
l=1hzli =
PL
l=1
2
l+1lhki ' N .
This allows to nd L from the scaling behavior of l. In
particular, for the SocNet L = 10.
In summary, we have shown that the contributions to
centrality measures coming from dierent length scales
of the geodesics exhibit characteristic scaling in large
graphs. Exploiting this universal property with the
methods presented here makes it possible to predict be-
tweenness values, distributions and ranking with rela-
tively low computational costs.
This project was supported in part by the NSF BCS-
0826958, HDTRA 201473-35045 and by the Army Re-
search Laboratory, W911NF-09-2-0053. Views and con-
clusions are those of the authors, not representing those
of the ARL or U.S. Govt.
Electronic address: mercseyr@nd.edu
y Electronic address: toro@nd.edu
[1] S. Wasserman and K. Faust, Social Network Analysis:
methods and applications (Cambridge Univ. Press, 1994).
[2] J. Scott, Social Network Analysis: A Handbook (Sage
Publications, 1991).
[3] G. Sabidussi, Psychometrika 31, 581 (1966).
[4] N. E. Friedkin, Amer. J. of Soc. 96, 1478 (1991).
[5] S. P. Borgatti and M. G. Everett, Soc. Netw. 28, 466
(2006).
[6] L. C. Freeman, Sociometry 40, 35 (1977).
[7] S. P. Borgatti, Soc. Netw. 27, 55 (2005).
[8] J. M. Anthonisse, Tech. Rep. BN 9/71, Stichting Math.
Centr., Amsterdam (1971).
[9] S. Sreenivasan et al., Phys. Rev. E 75, 036105 (2007).
[10] L. Dall'Asta et al., Theor. Comp. Sci. 355, 6 (2006).
[11] L. Dall'Asta et al., Phys. Rev. E 71, 036135 (2005).
[12] K.-I. Goh et al., Phys. Rev. Lett. 87, 278701 (2001).
[13] B. Danila et al., Phys. Rev. E 74, 046114 (2006).
[14] R. Guimera et al., Phys. Rev. Lett. 89, 248701 (2001).
[15] P. Holme et al., Phys. Rev. E 65, 056109 (2002).
[16] A.E. Motter, Phys. Rev. Lett. 93, 098701 (2004).
[17] A. Vespignani, Science 325, 425 (2009).
[18] L. Dall'Asta et al., J.Stat.Mech., P04006, (2006).
[19] U. Brandes, J. of Math. Sociology 25, 163 (2001).
[20] M. E. J. Newman, Phys. Rev. E 64, 016132 (2001).
[21] U. Brandes, Soc. Netw. 30, 136 (2008).
[22] J.D. Noh and H. Rieger, Phys. Rev. Lett. 92, 118701
(2004).
[23] U. Brandes and C. Pich, I. J. Bif. Chaos 17, 2303 (2007).
[24] R. Geisberger et al., in ALENEX , 90 (2008).
[25] M. C. Gonzalez et al., Nature 453, 779 (2008).
[26] P. Erd}os and A. Renyi, Publ. Math. Inst. Hung. Acad.
Sci 5, 17 (1960).
[27] A. L. Barabasi and R. Albert, Science 286, 509 (1999).
[28] J. Dall and M. Christensen, Phys. Rev. E 66, 016121
(2002).
[29] M. E. J. Newman et al., Phys. Rev. E 64, 026118 (2001).
[30] J. Shao et al., Phys. Rev. E 80, 036105 (2009).
[31] J. P. Onnela et al., PNAS, 104, 7332 (2007).
[32] M. Seshadri et al., SIGKDD-08 (2008).
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
19 Readers on Mendeley
by Discipline
47% Physics
5% Engineering
by Academic Status
42% Ph.D. Student
26% Post Doc
11% Professor
by Country
16% Republic of Singapore
11% United Kingdom
11% China


