This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Community detection in networks:
Modularity optimization and maximum likelihood are equivalent

M. E. J. Newman Department of Physics and Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI 48109
Abstract

We demonstrate an exact equivalence between two widely used methods of community detection in networks, the method of modularity maximization in its generalized form which incorporates a resolution parameter controlling the size of the communities discovered, and the method of maximum likelihood applied to the special case of the stochastic block model known as the planted partition model, in which all communities in a network are assumed to have statistically similar properties. Among other things, this equivalence provides a mathematically principled derivation of the modularity function, clarifies the conditions and assumptions of its use, and gives an explicit formula for the optimal value of the resolution parameter.

I Introduction

Community detection, sometimes called network clustering, is the division of the nodes of an observed network into groups such that connections are dense within groups but sparser between them POM09 ; Fortunato10 ; Newman12 . Not all networks support such divisions, but many do, and the existence of good divisions is often taken as a hint of underlying semantic structure or possible mechanisms of network formation, making community detection a useful tool for interpreting network data.

The development of methods or algorithms to perform community detection on empirical networks has been a popular pursuit among researchers in physics, mathematics, statistics, and computer science—a tremendous number of such algorithms have been published in the last decade or so POM09 ; Fortunato10 ; Newman12 ; CGP11 . In this paper we study two of the most popular and widely used methods for community detection in simple undirected networks, the method of modularity maximization and the method of maximum likelihood as applied to the stochastic block model. Building on previous work by ourselves and others Newman13a ; Newman13b ; ZM14 , we show that, different though they at first appear, these two methods are in fact exactly equivalent, for appropriate choices of models and parameters, which we specify. This sheds light in particular on the modularity maximization method, which is generally motivated with heuristic arguments NG04 ; Newman04a ; RB06a (although there are some rigorous results Brandes07 ; BC09 ). Our results provide a rigorous derivation for the modularity and demonstrate that modularity maximization is optimal under appropriate conditions, but also highlight the method’s limitations. In particular, we show that modularity maximization effectively assumes that communities in a network are statistically similar, and it is not guaranteed to give good results for networks where this is not true.

II Community detection

We begin by describing the two methods of community detection that we study, in their most widely accepted forms, beginning with the method of modularity maximization.

II.1 Modularity maximization

Modularity maximization is perhaps the most widely used method for community detection for networks. It operates by defining a benefit function, called the modularity, that measures the quality of divisions of a network into communities. One optimizes this benefit function over possible divisions of the network of interest to find the one that gives the highest score, taking this to be the definitive division of the network. Since the number of possible divisions of a network is exponentially large, we normally cannot perform the optimization exhaustively, so we turn instead to approximate optimization methods, of which many have been tried, including greedy algorithms Newman04a ; CNM04 , extremal optimization DA05 , spectral relaxation Newman06b , genetic algorithms SYHF09 , simulated annealing GSA04 ; MAD05 , and belief propagation ZM14 . The popular Louvain algorithm for community detection BGLL08 , which is built into a number of network analysis software packages, uses a multiscale modularity optimization scheme and is one of the fastest community detection methods in practice.

The definition of the modularity function is straightforward NG04 . We desire a benefit function which, given a network and a candidate division of that network into groups, returns a score that is larger if the division is a “good” one and smaller if it is “bad.” The heuristic notion used to define the modularity is that a good division is one that places most of the edges of a network within groups and only a few of them between groups.

Let us represent our network by its adjacency matrix. For an undirected network of nn nodes the adjacency matrix 𝐀\mathbf{A} is the real symmetric n×nn\times n matrix with elements Aij=1A_{ij}=1 if there is an edge between nodes ii and jj and 0 otherwise. Further, let us consider a division of the network into qq nonoverlapping groups, numbered (in any order) from 1 to qq, and let us denote by gig_{i} the number of the group to which node ii is assigned. Thus the complete vector 𝐠\mathbf{g} of group assignments specifies the division of the network. Then the number of edges that fall within groups, for this particular division, is equal to 12ijAijδgigj\tfrac{1}{2}\sum_{ij}A_{ij}\delta_{g_{i}g_{j}}, where δij\delta_{ij} is the Kronecker delta and the leading factor of a half prevents double counting of edges.

The number of in-group edges alone, however, is not a good measure of the quality of a division, since it can be trivially maximized by putting all the nodes in one of the qq groups and none in any of the others. This would put 100% of edges inside groups but clearly doesn’t constitute a useful division of the network. Instead, therefore, modularity measures not just the number of edges within groups but the difference between that number and the expected number of such edges, were edges placed at random within the network.

Suppose we take our observed network and randomize the positions of its edges. We keep the total number of edges the same but we reposition them between the nodes at random, in a manner to be determined shortly. And suppose that, following this randomization, the probability that nodes ii and jj are connected by an edge is PijP_{ij}. Then the expected number of edges within groups, post-randomization, is 12ijPijδgigj\tfrac{1}{2}\sum_{ij}P_{ij}\delta_{g_{i}g_{j}}. The modularity is then proportional to the actual minus expected number of edges thus:

Q\displaystyle Q =1m(12ijAijδgigj12ijPijδgigj)\displaystyle={1\over m}\biggl{(}\tfrac{1}{2}\sum_{ij}A_{ij}\delta_{g_{i}g_{j}}-\tfrac{1}{2}\sum_{ij}P_{ij}\delta_{g_{i}g_{j}}\biggr{)}
=12mij(AijPij)δgigj\displaystyle={1\over 2m}\sum_{ij}\bigl{(}A_{ij}-P_{ij}\bigr{)}\delta_{g_{i}g_{j}} (1)

where mm is the total number of edges in the network and is included here by convention only—it makes QQ equal to a fraction of edges rather than an absolute number, which makes modularity values more easily comparable between networks of different sizes. For the purposes of maximizing the modularity, which is our main concern here, the factor of mm makes no difference at all. The position of the maximum does not depend on overall constant factors.

Note that if we now put all nodes in the same group, then δgigj=1\delta_{g_{i}g_{j}}=1 for all i,ji,j and

Q=12mij(AijPij)=0,Q={1\over 2m}\sum_{ij}\bigl{(}A_{ij}-P_{ij}\bigr{)}=0, (2)

since, as we have said, the number of edges in the network is held constant during randomization, and hence ijPij=ijAij=2m\sum_{ij}P_{ij}=\sum_{ij}A_{ij}=2m. Thus we no longer get a high modularity score for putting all nodes in a single group together. The maximum of modularity occurs for some other (nontrivial) division of the nodes, which we take to be the best division of the network. This is the method of modularity maximization.

It remains to determine what PijP_{ij} is. The value depends on the particular scheme we use to randomize the positions of the edges. The simplest scheme would be just to reposition the edges uniformly at random, every position having the same probability as every other. For a network of nn nodes there are (n2){n\choose 2} places to put an edge, and hence the probability of filling any of them with one of the mm edges is

Pij=m(n2).P_{ij}={m\over{n\choose 2}}. (3)

(Technically this is the expected number of edges not the probability, but normally m(n2)m\ll{n\choose 2} so that probability and expected number are essentially the same.)

In practice, however, this choice does not work very well because it fails to respect the degrees of the nodes in the network. The probabilities of connections between nodes depend strongly on the total number of connections nodes have—their degrees—with nodes of high degree being much more likely to be connected than nodes of low degree. For reasons that will become clear in this paper, it is important to include this effect in the definition of modularity if things are to work correctly.

Instead, therefore, we consider a constrained randomization of the edges in the network in which we preserve the degree of every node, but otherwise position the edges at random. This kind of randomization is well known in the study of networks: it gives rise to the random graph ensemble known as the configuration model MR95 ; NSW01 . After randomization, the probability of connection between two nodes is equal to

Pij=kikj2m,P_{ij}={k_{i}k_{j}\over 2m}, (4)

where ki=jAijk_{i}=\sum_{j}A_{ij} is the degree of node ii and mm is once again the number of edges in the network. (Again, this is technically the expected number of edges, but the probability and expected number are essentially the same.)

This is the choice that is most commonly used in the definition of the modularity. With this choice the modularity is given by

Q=12mij(Aijkikj2m)δgigj,Q={1\over 2m}\sum_{ij}\biggl{(}A_{ij}-{k_{i}k_{j}\over 2m}\biggr{)}\delta_{g_{i}g_{j}}, (5)

which is the form in which it is most commonly written.

There is a further twist, however, because even this definition does not always work well. As shown by Fortunato and Barthélémy FB07 , community detection by modularity maximization using the definition of (5), while it works in many situations, has one specific shortcoming: it is unable to find community structure in large networks with many small communities. In particular, if the number of communities in a network is greater than about 2m\sqrt{2m}, then the maximum modularity will not correspond to the correct division. The maximum will instead tend to combine communities into larger groups and fail to resolve the smallest divisions in the network.

To address this problem, Reichardt and Bornholdt RB06a proposed a generalized modularity function

Q(γ)=12mij(Aijγkikj2m)δgigj.Q(\gamma)={1\over 2m}\sum_{ij}\biggl{(}A_{ij}-\gamma{k_{i}k_{j}\over 2m}\biggr{)}\delta_{g_{i}g_{j}}. (6)

When the parameter γ=1\gamma=1, this is the same as the traditional modularity of Eq. (5), but other choices allow us to vary the relative weight given to the observed and randomized edge terms. If one places more weight on the observed edge term (by making γ\gamma smaller), the maximum modularity division favors, and the method therefore tends to find, larger communities. If one places more weight on the randomized edge term (larger γ\gamma), the method finds smaller communities. There has not previously been any fundamental theory dictating what value of γ\gamma one should use, but this is one of the questions on which we will shed light in this paper.

II.2 Statistical inference

The other method of community detection we consider is the method of statistical inference, as applied to the stochastic block model and its variants. With this method, one fits a generative model of a network to observed network data, and the parameters of the fit tell us about the structure of the network in much the same way that fitting a straight line through a set of data points tells us about their slope.

The model most commonly used in this context is the so-called stochastic block model, which is a random graph model of a network with community structure HLL83 ; NS01 ; BC09 . One takes some number nn of nodes, initially without any edges, and divides them into qq groups in some way, with gig_{i} being the group to which node ii is assigned, as previously. Then one places edges between nodes independently at random, with the probability, which we denote ωrs\omega_{rs}, of an edge between a particular pair of nodes depending on the groups rr and ss to which the nodes belong. Thus there is a symmetric q×qq\times q matrix of parameters ωrs\omega_{rs} which determine the probabilities of edges within and between every pair of groups. If the diagonal elements ωrr\omega_{rr} of this matrix are larger than the off-diagonal elements, then networks generated by the model have a higher probability of edges within groups than between them and hence have traditional community structure.

In fact, the stochastic block model is often studied in a slightly different formulation in which one places not just a single edge between any pair of nodes but a Poisson distributed number of edges with mean ωrs\omega_{rs}. Thus ωrs\omega_{rs} is the expected number, rather than the probability, of edges between nodes in groups rr and ss, and the networks generated by the model can in principle have multiedges, meaning there can be more than one edge between the same pair of nodes. Moreover, one typically also allows the network to contain self-edges, edges that connect a node to itself, which are also Poisson distributed in number, with mean 12ωrr\tfrac{1}{2}\omega_{rr} for a node in group rr. (The factor of half is included solely because it makes the algebra simpler.) The inclusion of multiedges and self-edges in the model is unrealistic in the sense that most network data encountered in the real world contain neither. However, most real-world networks are also very sparse, meaning that only a tiny fraction of all possible edges that could exist actually do, and hence the values of the edge probabilities ωrs\omega_{rs} are very small. In this situation, the density of multiedges and self-edges in the network will itself be small and the Poisson version of the model is virtually indistinguishable from the first (Bernoulli) version defined above. At the same time, the Poisson version is technically easier to handle than the Bernoulli version. In this paper we use the Poisson version. Similar results can be derived for the Bernoulli version, but the formulas are more complicated and the end result is little different.

The definition of the model above is in terms of its use to generate networks. As applied to community detection, however, the model is used in the “reverse” direction to infer structure by fitting it to data. In this context, one hypothesizes that an observed network, with adjacency matrix 𝐀\mathbf{A}, was generated from the stochastic block model, and attempts to work out what values of the model parameters must have been used in the generation. The parameters in this case are the edge probabilities ωrs\omega_{rs} and the group memberships gig_{i}.

Given particular values of the parameters we can write down the probability, or likelihood, that the observed network was generated from the block model thus:

P(𝐀|𝛀,𝐠)=i(12ωgigi)Aii/2(12Aii)!eωgigi/2i<jωgigjAijAij!eωgigj,P(\mathbf{A}|\mathbf{\Omega},\mathbf{g})=\prod_{i}{(\tfrac{1}{2}\omega_{g_{i}g_{i}})^{A_{ii}/2}\over(\tfrac{1}{2}A_{ii})!}\mathrm{e}^{-\omega_{g_{i}g_{i}}/2}\prod_{i<j}{\omega_{g_{i}g_{j}}^{A_{ij}}\over A_{ij}!}\mathrm{e}^{-\omega_{g_{i}g_{j}}}, (7)

where 𝛀\mathbf{\Omega} denotes the complete matrix of values ωrs\omega_{rs} and we have adopted the standard convention that a self-edge is represented by a diagonal adjacency matrix element Aii=2A_{ii}=2 (and not 1 as one might at first imagine).

The position of the maximum of this quantity with respect to 𝛀\mathbf{\Omega} and 𝐠\mathbf{g} tells us the values of the parameters most likely to have generated the observed network. Typically we are interested only in the group assignments 𝐠\mathbf{g}, which tell us how the network divides into groups. The probabilities 𝛀\mathbf{\Omega} are usually uninteresting and can be discarded once the likelihood has been maximized.

Alternatively (and usually more conveniently), we can maximize the logarithm of the likelihood:

log\displaystyle\log P(𝐀|𝛀,𝐠)\displaystyle P(\mathbf{A}|\mathbf{\Omega},\mathbf{g})
=i[12Aiilog(12ωgigj)12ωgigjlog(12Aij!)]\displaystyle=\sum_{i}\bigl{[}\tfrac{1}{2}A_{ii}\log\bigl{(}\tfrac{1}{2}\omega_{g_{i}g_{j}}\bigr{)}-\tfrac{1}{2}\omega_{g_{i}g_{j}}-\log\bigl{(}\tfrac{1}{2}A_{ij}!\bigr{)}\bigr{]}
+i<j(AijlogωgigjωgigjlogAij!).\displaystyle\qquad{}+\sum_{i<j}\bigl{(}A_{ij}\log\omega_{g_{i}g_{j}}-\omega_{g_{i}g_{j}}-\log A_{ij}!\bigr{)}. (8)

The terms 12Aiilog12\tfrac{1}{2}A_{ii}\log\tfrac{1}{2}, log(12Aij!)\log(\tfrac{1}{2}A_{ij}!), and logAij!\log A_{ij}! are all independent of the parameters and do not affect the position of the maximum, so they can be ignored, and the log-likelihood then simplifies to

logP(𝐀|𝛀,𝐠)=12ij(Aijlogωgigjωgigj).\log P(\mathbf{A}|\mathbf{\Omega},\mathbf{g})=\tfrac{1}{2}\sum_{ij}\bigl{(}A_{ij}\log\omega_{g_{i}g_{j}}-\omega_{g_{i}g_{j}}\bigr{)}. (9)

The optimal division of the network into communities is then given by maximizing this quantity with respect to both 𝐠\mathbf{g} and 𝛀\mathbf{\Omega}.

II.3 Degree-corrected block model

As with the modularity, however, this is not the whole story. This approach fares poorly when applied to most real-world networks because it doesn’t respect the node degrees in the network. The stochastic block model as described here (in either Bernoulli or Poisson versions) generates networks that have a Poisson degree distribution, which is very different from the broad distributions seen in empirical networks. This means that, typically, the model does not fit observed networks well for any choice of parameter values. It’s as if one were trying to fit a straight line through an inherently curved set of data points. Even the best fit of such a line will not be a good fit. There are no good fits when the model you are fitting is simply wrong.

The conventional solution to this problem, in the present situation, is to use a slightly different model, the degree-corrected block model KN11a , which can fit networks with any degree distribution. In this model the nodes are again assigned to groups gig_{i} and edges placed independently at random between them, but now the expected number of edges between nodes ii and jj is (kikj/2m)ωrs(k_{i}k_{j}/2m)\omega_{rs} (where as before r,sr,s are the groups to which the nodes belong and ki,kjk_{i},k_{j} are the degrees) or a half that number for self-edges. The factor of 2m2m in the denominator is optional, but convenient since, as we have said, kikj/2mk_{i}k_{j}/2m is the probability of an edge in the configuration model and hence, with this definition, ωrs\omega_{rs} quantifies the probability of edges relative to the configuration model.

Following the same line of reasoning as before, and again neglecting constants that have no effect on the position of the likelihood maximum, the log-likelihood for this model is

logP(𝐀|𝛀,𝐠)=12ij(Aijlogωgigjkikj2mωgigj).\log P(\mathbf{A}|\mathbf{\Omega},\mathbf{g})=\tfrac{1}{2}\sum_{ij}\biggl{(}A_{ij}\log\omega_{g_{i}g_{j}}-{k_{i}k_{j}\over 2m}\omega_{g_{i}g_{j}}\biggr{)}. (10)

Community detection now involves the maximization of this quantity with respect to the parameters 𝛀,𝐠\mathbf{\Omega},\mathbf{g} to find the best fit of the model to the observed network. This maximization can be achieved in a number of ways. As with the modularity, there are too many possible group assignments 𝐠\mathbf{g} to maximize exhaustively on any but the smallest of networks, but researchers have successfully applied a variety of approximate methods, including label switching algorithms BC09 , Kernighan–Lin style greedy algorithms KL70 ; KN11a , spectral methods Newman13b , Monte Carlo NS01 ; Peixoto14b , and belief propagation DKMZ11a ; Yan14 .

III The planted partition model and modularity maximization

We now come to the central result of this paper, the equivalence of modularity maximization to a particular case of the maximum likelihood method described above. We previously discussed a version of this equivalence in the context of work on spectral algorithms Newman13b ; Newman13a and it has also been discussed by Zhang and Moore ZM14 in the context of work on finite-temperature ensembles of graph partitions. Building on these works, our purpose in this paper is to make explicit the exact equivalence of the two approaches and investigate some of its consequences.

The planted partition model CK01 ; McSherry01 is a special case of the stochastic block model in which the parameters ωrs\omega_{rs} describing the community structure take only two different values:

ωrs={ωinif r=s,ωoutif rs.\omega_{rs}=\biggl{\{}\begin{array}[]{ll}\omega_{\textrm{in}}&\qquad\text{if $r=s$,}\\ \omega_{\textrm{out}}&\qquad\text{if $r\neq s$.}\end{array} (11)

This is a less flexible model than the full stochastic block model. It effectively assumes that all communities in the network are similar in the sense of having the same in-group and between-group connection rates. Nonetheless, for networks that do have this property, fits to the model should recover the community structure accurately, and indeed it has been proved that such fits are optimal in that case DKMZ11a ; Massoulie14 ; MNS15 .

In practice, if one wanted to apply the planted partition model, one should in almost all cases use a degree-corrected version of the kind described in Section II.3. Let us explore the form of the log-likelihood, Eq. (10), for such a model. Following Newman13a ; Newman13b we note that Eq. (11) implies that

ωrs\displaystyle\omega_{rs} =(ωinωout)δrs+ωout,\displaystyle=(\omega_{\textrm{in}}-\omega_{\textrm{out}})\delta_{rs}+\omega_{\textrm{out}}, (12)
logωrs\displaystyle\log\omega_{rs} =(logωinlogωout)δrs+logωout,\displaystyle=(\log\omega_{\textrm{in}}-\log\omega_{\textrm{out}})\delta_{rs}+\log\omega_{\textrm{out}}, (13)

where δrs\delta_{rs} is the Kronecker delta, as previously. Substituting these forms into Eq. (10), we find the log-likelihood for the degree-corrected planted partition model to be

logP(𝐀|𝛀,𝐠)\displaystyle\log P(\mathbf{A}|\mathbf{\Omega},\mathbf{g}) =12ijAij[δgigjlogωinωout+logωout]\displaystyle=\tfrac{1}{2}\sum_{ij}A_{ij}\biggl{[}\delta_{g_{i}g_{j}}\log{\omega_{\textrm{in}}\over\omega_{\textrm{out}}}+\log\omega_{\textrm{out}}\biggr{]}
12ijkikj2m[(ωinωout)δgigj+ωout]\displaystyle\qquad{}-\tfrac{1}{2}\sum_{ij}{k_{i}k_{j}\over 2m}\bigl{[}(\omega_{\textrm{in}}-\omega_{\textrm{out}})\delta_{g_{i}g_{j}}+\omega_{\textrm{out}}\bigr{]}
=12logωinωoutij(Aij(ωinωout)(logωinlogωout)kikj2m)δgigj\displaystyle\hskip-55.00008pt{}=\tfrac{1}{2}\log{\omega_{\textrm{in}}\over\omega_{\textrm{out}}}\sum_{ij}\biggl{(}A_{ij}-{(\omega_{\textrm{in}}-\omega_{\textrm{out}})\over(\log\omega_{\textrm{in}}-\log\omega_{\textrm{out}})}\,{k_{i}k_{j}\over 2m}\biggr{)}\delta_{g_{i}g_{j}}
+m(ωout+logωout)\displaystyle+m\bigl{(}\omega_{\textrm{out}}+\log\omega_{\textrm{out}}\bigr{)}
=B12mij(Aijγkikj2m)δgigj+C,\displaystyle\hskip-55.00008pt{}=B\,{1\over 2m}\sum_{ij}\biggl{(}A_{ij}-\gamma{k_{i}k_{j}\over 2m}\biggr{)}\delta_{g_{i}g_{j}}+C, (14)

where BB and CC are constants that depend on ωin\omega_{\textrm{in}} and ωout\omega_{\textrm{out}} but not on 𝐠\mathbf{g}, and

γ=ωinωoutlogωinlogωout.\gamma={\omega_{\textrm{in}}-\omega_{\textrm{out}}\over\log\omega_{\textrm{in}}-\log\omega_{\textrm{out}}}. (15)

We have also made use of ijAij=iki=2m\sum_{ij}A_{ij}=\sum_{i}k_{i}=2m in the second equality of (14).

To perform community detection, one would now maximize this expression with respect to both the group assignments gig_{i} and the parameters ωin\omega_{\textrm{in}} and ωout\omega_{\textrm{out}}. But suppose for a moment that we already know the correct values of ωin\omega_{\textrm{in}} and ωout\omega_{\textrm{out}}, leaving us only to maximize with respect to the group assignments. Comparing Eq. (14) with Eq. (6), we see that, apart from overall constants, (14) is precisely the generalized modularity Q(γ)Q(\gamma), and hence the likelihood and the modularity have their maxima with respect to the gig_{i} in the same place. Thus community detection by maximization of the likelihood for the planted partition model with known values of ωin\omega_{\textrm{in}} and ωout\omega_{\textrm{out}} is equivalent to maximizing the generalized modularity for the appropriate value of γ\gamma, given by Eq. (15). (We leave it as a exercise for the reader to show that a similar equivalence applies—with the same value of γ\gamma—between maximizing the likelihood for the non-degree-corrected stochastic block model and the modularity when one makes the choice (3) for PijP_{ij}.)

Among other things, this result tells us what the correct value of the resolution parameter γ\gamma is for the generalized modularity, an issue that has hitherto been undecided. The correct value is given by Eq. (15). An immediate corollary is that in most cases the conventional choice γ=1\gamma=1, corresponding to the original, non-generalized modularity function of Eq. (5), is not correct.

Unfortunately, however, we cannot normally employ Eq. (15) directly to calculate γ\gamma, since we do not know the values of the parameters ωin\omega_{\textrm{in}} and ωout\omega_{\textrm{out}}. We present one possible solution to this problem in Section III.1, but for the moment let us proceed under the assumption that we know the correct value of γ\gamma.

The equivalence between modularity maximization and maximum-likelihood methods has a number of immediate implications. First of all, it provides a derivation of the modularity that is more rigorous and principled than the usual heuristic arguments: modularity maximization (with the correct choice of γ\gamma) is equivalent to fitting a network to a degree-corrected version of the planted partition model using the method of maximum likelihood. It also explains why the standard degree-dependent choice, Eq. (4), for the definition of the modularity is better than the uniform choice of Eq. (3). It is for the same reason that the degree-corrected block model is the correct choice for the analysis of most real-world networks: the uniform choice effectively assumes a network with a Poisson degree distribution, which is a poor approximation to most empirical networks. The degree-dependent choice, by contrast, fits networks of any degree distribution.

The equivalence of modularity and maximum-likelihood methods also implies that modularity maximization is a consistent method of community detection, i.e., that under suitable conditions it will correctly and without bias find community structure where present. This follows because maximum-likelihood fits to stochastic block models are also known to be consistent: when the maximum-likelihood method is applied to networks that were themselves generated from the same block model (either the traditional or degree-corrected model), it will return correct assignments of nodes to groups in the limit of large node degrees BC09 . The consistency of modularity maximization has been demonstrated previously by other means ZLZ11 ; ZM14 , but the equivalence with likelihood maximization makes the physical intuition behind it particularly clear.

A further point of interest is that while the value of γ\gamma in Eq. (15) is always positive, regardless of the values of ωin\omega_{\textrm{in}} and ωout\omega_{\textrm{out}}, the value of the constant B=mlog(ωin/ωout)B=m\log(\omega_{\textrm{in}}/\omega_{\textrm{out}}) in Eq. (14) changes sign depending on which of ωin\omega_{\textrm{in}} and ωout\omega_{\textrm{out}} is larger. This means that maximization of the likelihood becomes equivalent to minimization of the modularity when ωout>ωin\omega_{\textrm{out}}>\omega_{\textrm{in}}, i.e., when the network has so-called disassortative structure, in which connections are more common between groups than within them. The minimization of modularity to find such structure has been proposed previously on heuristic grounds Newman06c , but the derivation here gives a rigorous foundation for the procedure.

On the other hand, the equivalence of maximum likelihood and maximum modularity methods also reveals some hidden assumptions and limitations of the modularity. The planted partition model, with its assumption, Eq. (11), that the edge parameters ωrs\omega_{rs} take the same values for every community, is less powerful than the full stochastic block model and modularity maximization is similarly less powerful as a result. In effect, modularity maximization assumes all communities in a network to be statistically similar. This may be a good assumption in some networks, but there are certainly examples where it is not, and we would expect the modularity maximization method to perform less well in such cases than more general methods.

Some variants of the maximum-likelihood method also include additional parameters that allow for heterogeneous group sizes, but the version used here, to which modularity maximization is equivalent, includes no such embellishments, meaning that modularity maximization implicitly favors groups of uniform size, which could also hurt performance.

III.1 Value of the resolution parameter

A drawback of the equivalence we have demonstrated is that it applies only when we use the correct value of the resolution parameter γ\gamma, which normally we do not know. One can, however, make an empirical estimate of the value of γ\gamma using an iterative scheme as follows.

First, one makes an initial guess about the value of γ\gamma. This guess need not be particularly accurate: γ=1\gamma=1 usually works fine. Then one finds the communities in the network by modularity maximization using this choice. This gives us some set of assignments gig_{i} of nodes to groups—likely not optimal—from which we can then make an estimate of the parameters ωin\omega_{\textrm{in}} and ωout\omega_{\textrm{out}} by noting that the expected total number of in-group edges minm_{\textrm{in}} in the (degree-corrected) planted partition model is

min=12rijkikj2mωrrδgi,rδgj,r=ωin4mrκr2,m_{\textrm{in}}=\tfrac{1}{2}\sum_{r}\sum_{ij}{k_{i}k_{j}\over 2m}\omega_{rr}\delta_{g_{i},r}\delta_{g_{j},r}={\omega_{\textrm{in}}\over 4m}\sum_{r}\kappa_{r}^{2}, (16)

where κr=ikiδgi,r\kappa_{r}=\sum_{i}k_{i}\delta_{g_{i},r} is the sum of the degrees of the nodes in group rr. Hence we can estimate ωin\omega_{\textrm{in}} from

ωin=2minrκr2/2m,\omega_{\textrm{in}}={2m_{\textrm{in}}\over\sum_{r}\kappa_{r}^{2}/2m}, (17)

using the observed value of minm_{\textrm{in}} as an estimate of the expected value. Similarly for ωout\omega_{\textrm{out}} we have

ωout=2moutrsκrκs/2m=2m2min2mrκr2/2m,\omega_{\textrm{out}}={2m_{\textrm{out}}\over\sum_{r\neq s}\kappa_{r}\kappa_{s}/2m}={2m-2m_{\textrm{in}}\over 2m-\sum_{r}\kappa_{r}^{2}/2m}, (18)

where moutm_{\textrm{out}} is the number of edges running between distinct groups.

Refer to caption
Figure 1: Resolution parameter γ\gamma, estimated using the method described here, for a set of synthetic networks with varying numbers of communities. The networks were generated using the standard (non-degree-corrected) planted partition model with qq equally sized groups of 250 nodes each and parameters ωin\omega_{\textrm{in}} and ωout\omega_{\textrm{out}} chosen so that each node has an average of 16 connections within its own group and 8 to every other group. Modularity was maximized using simulated annealing. The circles represent the estimated values of γ\gamma and the solid line represents the true values calculated from Eq. (15).

Using these estimates of ωin\omega_{\textrm{in}} and ωout\omega_{\textrm{out}} we can now calculate a new value of γ\gamma from Eq. (15). Then we repeat the process, maximizing the modularity and recalculating ωin\omega_{\textrm{in}}, ωout\omega_{\textrm{out}}, and γ\gamma until we achieve convergence. The consistency of modularity maximization, mentioned earlier, implies that this procedure should converge to the correct value of γ\gamma (and the correct community structure) for sufficiently dense networks that actually are generated from the planted partition model. For all other networks (meaning, in practice, for all real-world applications of the method) we have no formal guarantees of correctness or convergence, though the same is also true of all other methods of community detection, including, but not limited to, community detection by statistical inference.

Network nn mm qq γ\gamma
Karate club 34 78 2 0.78
Dolphin social network 62 159 2 0.59
Political blogs 1225 16780 2 0.67
Books about politics 105 441 2 0.59
Characters from Les Miserables 77 254 6 1.36
American college football 115 614 11 2.27
Jazz collaborations 198 2742 16 1.19
Email messages 1133 5451 26 3.63
Table 1: Number of nodes nn, number of edges mm, number of communities qq, and estimated value of the resolution parameter γ\gamma for a range of networks studied in the previous literature. The networks are the karate club network of Zachary Zachary77 , the dolphin social network of Lusseau et al. Lusseau03a , the network of political weblogs studied by Adamic and Glance AG05 , the network of books about politics studied by Krebs (unpublished, but see for instance Newman06b ), the network of interactions between fictional characters in the novel Les Miserables by Victor Hugo NG04 , the network of regular season games between Division I-A college football teams in the year 2000 NG04 , the network of collaborations between jazz musicians studied by Gleiser and Danon GD03 , and the network of email messages between university students of Ebel et al. EMB02 . The value of qq used for each network is the generally accepted one, except for the last two networks, for which there does not appear to be a consensus. For these two networks we estimate the number of communities using the inference method of NR16 .

One might imagine that this would not be a very efficient method for calculating γ\gamma: it requires repeated maximization of different modularity functions until the correct value of γ\gamma is reached. In practice, however, we have found that it converges very quickly. In most cases we have examined, γ\gamma is calculated to within a few percent after just one iteration, and in no case have we found a need for more than ten iterations, so the method may in fact be quite serviceable. Figure 1 shows an example application to a set of artificially generated (“synthetic”) networks for which the true value of γ\gamma is known and, as the figure shows, the algorithm is able to determine that value accurately in every case. Table 1 gives values of γ\gamma calculated using the algorithm for a number of real-world networks that have been used as test cases in previous community detection studies.

The values of γ\gamma vary in size, but there is an overall trend towards larger values in networks with larger numbers of communities, both among the synthetic networks and the real ones. This is perhaps not unexpected given that the resolution parameter γ\gamma was originally introduced precisely in order to deal with networks with larger numbers of communities. Recall that larger values of γ\gamma, and specifically values larger than the traditional value of 1, are needed in networks where the number of communities exceeds the resolution limit at 2m\sqrt{2m}. None of the networks studied here approach this limit, but nonetheless we should not find it surprising that the larger values of qq in both Fig. 1 and Table 1 are best treated using values γ>1\gamma>1.

Whether the algorithm given here is in fact a useful one in practice is a debatable point. As we have shown, it does no more than the likelihood maximization method, and the latter in principle gives better results, since it does not assume that all groups are statistically similar. Modularity maximization does have the advantage of being less nonlinear than maximum-likelihood methods, which allows for some faster algorithms such as spectral Newman06b ; Newman13b and multiscale BGLL08 algorithms. Still, the results derived here are primarily of interest not because of the algorithms they suggest, but because of the light they shed on the strengths and weaknesses of modularity maximization.

IV Conclusions

We have shown that modularity maximization is a special case of the maximum likelihood method of community detection, as applied to the degree-corrected planted partition model. The equivalence between the two approaches highlights some weaknesses of the modularity maximization method: the method assumes all communities to have statistically similar properties, which may not be the case, and it also contains an undetermined parameter γ\gamma. Most often this parameter is assumed to take the value 1, but we have shown that this assumption is normally not correct and given an explicit formula for the correct value, along with a simple algorithm for computing it on observed networks.

Acknowledgements.
The author thanks Aaron Clauset, Travis Martin, Cris Moore, and Cosma Shalizi for useful comments. This research was supported in part by the US National Science Foundation under grants DMS–1107796 and DMS–1407207.

References

  • (1) M. A. Porter, J.-P. Onnela, and P. J. Mucha, Communities in networks. Notices of the American Mathematical Society 56, 1082–1097 (2009).
  • (2) S. Fortunato, Community detection in graphs. Phys. Rep. 486, 75–174 (2010).
  • (3) M. E. J. Newman, Communities, modules and large-scale structure in networks. Nature Physics 8, 25–31 (2012).
  • (4) M. Coscia, F. Giannotti, and D. Pedreschi, A classification for community discovery methods in complex networks. Statistical Analysis and Data Mining 4, 512–546 (2011).
  • (5) M. E. J. Newman, Community detection and graph partitioning. Europhys. Lett. 103, 28003 (2013).
  • (6) M. E. J. Newman, Spectral methods for network community detection and graph partitioning. Phys. Rev. E 88, 042822 (2013).
  • (7) P. Zhang and C. Moore, Scalable detection of statistically significant communities and hierarchies, using message passing for modularity. Proc. Natl. Acad. Sci. USA 111, 18144–18149 (2014).
  • (8) M. E. J. Newman and M. Girvan, Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004).
  • (9) M. E. J. Newman, Fast algorithm for detecting community structure in networks. Phys. Rev. E 69, 066133 (2004).
  • (10) J. Reichardt and S. Bornholdt, Statistical mechanics of community detection. Phys. Rev. E 74, 016110 (2006).
  • (11) U. Brandes, D. Delling, M. Gaertler, R. Görke, M. Hoefer, Z. Nikoloski, and D. Wagner, On finding graph clusterings with maximum modularity. In Proceedings of the 33rd International Workshop on Graph-Theoretic Concepts in Computer Science, number 4769 in Lecture Notes in Computer Science, Springer, Berlin (2007).
  • (12) P. J. Bickel and A. Chen, A nonparametric view of network models and Newman–Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106, 21068–21073 (2009).
  • (13) A. Clauset, M. E. J. Newman, and C. Moore, Finding community structure in very large networks. Phys. Rev. E 70, 066111 (2004).
  • (14) J. Duch and A. Arenas, Community detection in complex networks using extremal optimization. Phys. Rev. E 72, 027104 (2005).
  • (15) M. E. J. Newman, Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103, 8577–8582 (2006).
  • (16) L. Shuzhuo, C. Yinghui, D. Haifeng, and M. W. Feldman, A genetic algorithm with local search strategy for improved detection of community structure. Complexity 15(4), 53–60 (2010).
  • (17) R. Guimerà, M. Sales-Pardo, and L. A. N. Amaral, Modularity from fluctuations in random graphs and complex networks. Phys. Rev. E 70, 025101 (2004).
  • (18) A. Medus, G. Acuña, and C. O. Dorso, Detection of community structures in networks via global optimization. Physica A 358, 593–604 (2005).
  • (19) V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).
  • (20) M. Molloy and B. Reed, A critical point for random graphs with a given degree sequence. Random Structures and Algorithms 6, 161–179 (1995).
  • (21) M. E. J. Newman, S. H. Strogatz, and D. J. Watts, Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E 64, 026118 (2001).
  • (22) S. Fortunato and M. Barthélemy, Resolution limit in community detection. Proc. Natl. Acad. Sci. USA 104, 36–41 (2007).
  • (23) P. W. Holland, K. B. Laskey, and S. Leinhardt, Stochastic blockmodels: Some first steps. Social Networks 5, 109–137 (1983).
  • (24) K. Nowicki and T. A. B. Snijders, Estimation and prediction for stochastic blockstructures. J. Amer. Stat. Assoc. 96, 1077–1087 (2001).
  • (25) B. Karrer and M. E. J. Newman, Stochastic blockmodels and community structure in networks. Phys. Rev. E 83, 016107 (2011).
  • (26) B. W. Kernighan and S. Lin, An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal 49, 291–307 (1970).
  • (27) T. P. Peixoto, Efficient monte carlo and greedy heuristic for the inference of stochastic block models. Phys. Rev. E 89, 012804 (2014).
  • (28) A. Decelle, F. Krzakala, C. Moore, and L. Zdeborová, Inference and phase transitions in the detection of modules in sparse networks. Phys. Rev. Lett. 107, 065701 (2011).
  • (29) X. Yan, C. R. Shalizi, J. E. Jensen, F. Krzakala, C. Moore, L. Zdeborova, P. Zhang, and Y. Zhu, Model selection for degree-corrected block models. J. Stat. Mech. 2014, P05007 (2014).
  • (30) A. Condon and R. M. Karp, Algorithms for graph partitioning on the planted partition model. Random Structures and Algorithms 18, 116–140 (2001).
  • (31) F. McSherry, Spectral partitioning of random graphs. In Proceedings of the 42nd IEEE Symposium on the Foundations of Computer Science, pp. 529–537, Institute of Electrical and Electronics Engineers, New York (2001).
  • (32) L. Massoulié, Community detection thresholds and the weak ramanujan property. In Proceedings of the 46th Annual ACM Symposium on the Theory of Computing, pp. 529–694–703, Associate of Computing Machinery, New York (2014).
  • (33) E. Mossel, J. Neeman, and A. Sly, Reconstruction and estimation in the planted partition model. Probability Theory and Related Fields 162, 431–461 (2015).
  • (34) Y. Zhao, E. Levina, and J. Zhu, Consistency of community detection in networks under degree-corrected stochastic block models. Annals of Statistics 40, 2266–2292 (2011).
  • (35) M. E. J. Newman, Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74, 036104 (2006).
  • (36) W. W. Zachary, An information flow model for conflict and fission in small groups. Journal of Anthropological Research 33, 452–473 (1977).
  • (37) D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson, The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Can geographic isolation explain this unique trait? Behavioral Ecology and Sociobiology 54, 396–405 (2003).
  • (38) L. A. Adamic and N. Glance, The political blogosphere and the 2004 US election. In Proceedings of the WWW-2005 Workshop on the Weblogging Ecosystem (2005).
  • (39) P. Gleiser and L. Danon, Community structure in jazz. Advances in Complex Systems 6, 565–573 (2003).
  • (40) H. Ebel, L.-I. Mielsch, and S. Bornholdt, Scale-free topology of e-mail networks. Phys. Rev. E 66, 035103 (2002).
  • (41) M. E. J. Newman and G. Reinert, Estimating the number of communities in a network. Preprint arxiv:1605.02753 (2016).