Sign up & Download
Sign in

Modularity and community structure in networks

by M E J Newman
Proceedings of the National Academy of Sciences of the United States of America ()

Abstract

Many networks of interest in the sciences, including a variety of social and biological networks, are found to divide naturally into communities or modules. The problem of detecting and characterizing this community structure has attracted considerable recent attention. One of the most sensitive detection methods is optimization of the quality function known as "modularity" over the possible divisions of a network, but direct application of this method using, for instance, simulated annealing is computationally costly. Here we show that the modularity can be reformulated in terms of the eigenvectors of a new characteristic matrix for the network, which we call the modularity matrix, and that this reformulation leads to a spectral algorithm for community detection that returns results of better quality than competing methods in noticeably shorter running times. We demonstrate the algorithm with applications to several network data sets.

Cite this document (BETA)

Available from arxiv.org
Page 1
hidden

Modularity and community structur...

arXiv:physics/0602124v1 [physics.data-an] 17 Feb 2006 Modularity and community structure in networks M. E. J. Newman Department of Physics and Center for the Study of Complex Systems, Randall Laboratory, University of Michigan, Ann Arbor, MI 48109���1040 Many networks of interest in the sciences, including a variety of social and biological networks, are found to divide naturally into communities or modules. The problem of detecting and characterizing this community structure has attracted considerable recent attention. One of the most sensitive detection methods is optimization of the quality function known as ���modularity��� over the possible divisions of a network, but direct application of this method using, for instance, simulated annealing is computationally costly. Here we show that the modularity can be reformulated in terms of the eigenvectors of a new characteristic matrix for the network, which we call the modularity matrix, and that this reformulation leads to a spectral algorithm for community detection that returns results of better quality than competing methods in noticeably shorter running times. We demonstrate the algorithm with applications to several network data sets. Introduction Many systems of scientific interest can be represented as networks���sets of nodes or vertices joined in pairs by lines or edges. Examples include the Internet and the worldwide web, metabolic networks, food webs, neural networks, communication and distribution networks, and social networks. The study of networked systems has a history stretching back several centuries, but it has expe- rienced a particular surge of interest in the last decade, especially in the mathematical sciences, partly as a result of the increasing availability of large-scale accurate data describing the topology of networks in the real world. Statistical analyses of these data have revealed some un- expected structural features, such as high network tran- sitivity [1], power-law degree distributions [2], and the existence of repeated local motifs [3] see [4, 5, 6] for reviews. One issue that has received a considerable amount of attention is the detection and characterization of com- munity structure in networks [7, 8], meaning the appear- ance of densely connected groups of vertices, with only sparser connections between groups (Fig. 1). The abil- ity to detect such groups could be of significant practical importance. For instance, groups within the worldwide web might correspond to sets of web pages on related top- ics [9] groups within social networks might correspond to social units or communities [10]. Merely the finding that a network contains tightly-knit groups at all can convey useful information: if a metabolic network were divided into such groups, for instance, it could provide evidence for a modular view of the network���s dynamics, with dif- ferent groups of nodes performing different functions with some degree of independence [11, 12]. Past work on methods for discovering groups in net- works divides into two principal lines of research, both with long histories. The first, which goes by the name of graph partitioning, has been pursued particularly in computer science and related fields, with applications in parallel computing and VLSI design, among other ar- eas [13, 14]. The second, identified by names such as block FIG. 1: The vertices in many networks fall naturally into groups or communities, sets of vertices (shaded) within which there are many edges, with only a smaller number of edges between vertices of different groups. modeling, hierarchical clustering, or community structure detection, has been pursued by sociologists and more re- cently also by physicists and applied mathematicians, with applications especially to social and biological net- works [7, 15, 16]. It is tempting to suggest that these two lines of re- search are really addressing the same question, albeit by somewhat different means. There are, however, impor- tant differences between the goals of the two camps that make quite different technical approaches desirable. A typical problem in graph partitioning is the division of a set of tasks between the processors of a parallel computer so as to minimize the necessary amount of interprocessor communication. In such an application the number of processors is usually known in advance and at least an approximate figure for the number of tasks that each pro- cessor can handle. Thus we know the number and size of the groups into which the network is to be split. Also, the goal is usually to find the best division of the network re- gardless of whether a good division even exists���there is little point in an algorithm or method that fails to divide the network in some cases. Community structure detection, by contrast, is per-
Page 2
hidden
2 haps best thought of as a data analysis technique used to shed light on the structure of large-scale network datasets, such as social networks, Internet and web data, or biochemical networks. Community structure meth- ods normally assume that the network of interest divides naturally into subgroups and the experimenter���s job is to find those groups. The number and size of the groups is thus determined by the network itself and not by the experimenter. Moreover, community structure methods may explicitly admit the possibility that no good division of the network exists, an outcome that is itself considered to be of interest for the light it sheds on the topology of the network. In this paper our focus is on community structure de- tection in network datasets representing real-world sys- tems of interest. However, both the similarities and differences between community structure methods and graph partitioning will motivate many of the develop- ments that follow. The method of optimal modularity Suppose then that we are given, or discover, the struc- ture of some network and that we wish to determine whether there exists any natural division of its vertices into nonoverlapping groups or communities, where these communities may be of any size. Let us approach this question in stages and focus ini- tially on the problem of whether any good division of the network exists into just two communities. Perhaps the most obvious way to tackle this problem is to look for divisions of the vertices into two groups so as to mini- mize the number of edges running between the groups. This ���minimum cut��� approach is the approach adopted, virtually without exception, in the algorithms studied in the graph partitioning literature. However, as discussed above, the community structure problem differs crucially from graph partitioning in that the sizes of the commu- nities are not normally known in advance. If community sizes are unconstrained then we are, for instance, at lib- erty to select the trivial division of the network that puts all the vertices in one of our two groups and none in the other, which guarantees we will have zero intergroup edges. This division is, in a sense, optimal, but clearly it does not tell us anything of any worth. We can, if we wish, artificially forbid this solution, but then a division that puts just one vertex in one group and the rest in the other will often be optimal, and so forth. The problem is that simply counting edges is not a good way to quantify the intuitive concept of commu- nity structure. A good division of a network into com- munities is not merely one in which there are few edges between communities it is one in which there are fewer than expected edges between communities. If the num- ber of edges between two groups is only what one would expect on the basis of random chance, then few thought- ful observers would claim this constitutes evidence of meaningful community structure. On the other hand, if the number of edges between groups is significantly less than we expect by chance���or equivalently if the number within groups is significantly more���then it is reasonable to conclude that something interesting is going on. This idea, that true community structure in a network corresponds to a statistically surprising arrangement of edges, can be quantified using the measure known as modularity [17]. The modularity is, up to a multiplicative constant, the number of edges falling within groups mi- nus the expected number in an equivalent network with edges placed at random. (A precise mathematical formu- lation is given below.) The modularity can be either positive or negative, with positive values indicating the possible presence of com- munity structure. Thus, one can search for community structure precisely by looking for the divisions of a net- work that have positive, and preferably large, values of the modularity [18]. The evidence so far suggests that this is a highly effective way to tackle the problem. For instance, Guimer`a and Amaral [12] and later Danon et al. [8] op- timized modularity over possible partitions of computer- generated test networks using simulated annealing. In di- rect comparisons using standard measures, Danon et al. found that this method outperformed all other methods for community detection of which they were aware, in most cases by an impressive margin. On the basis of con- siderations such as these we consider maximization of the modularity to be perhaps the definitive current method of community detection, being at the same time based on sensible statistical principles and highly effective in practice. Unfortunately, optimization by simulated annealing is not a workable approach for the large network problems facing today���s scientists, because it demands too much computational effort. A number of alternative heuris- tic methods have been investigated, such as greedy algo- rithms [18] and extremal optimization [19]. Here we take a different approach based on a reformulation of the mod- ularity in terms of the spectral properties of the network of interest. Suppose our network contains n vertices. For a par- ticular division of the network into two groups let si = 1 if vertex i belongs to group 1 and si = -1 if it belongs to group 2. And let the number of edges between ver- tices i and j be Aij, which will normally be 0 or 1, al- though larger values are possible in networks where mul- tiple edges are allowed. (The quantities Aij are the el- ements of the so-called adjacency matrix.) At the same time, the expected number of edges between vertices i and j if edges are placed at random is kikj /2m, where ki and kj are the degrees of the vertices and m = 1 2 ��� i ki is the total number of edges in the network. Thus the modularity can be written Q = 1 4m summationdisplayparenleftbigg ij Aij - kikj 2m parenrightbigg sisj = 1 4m sT Bs, (1)

Readership Statistics

464 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
42% Ph.D. Student
 
14% Student (Master)
 
9% Post Doc
by Country
 
29% United States
 
10% United Kingdom
 
8% China

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in