1. Community Detection Resolution Limit Definition of resolution-free Results
Resolution-free community detection
V.A. Traag1, P. Van Dooren1, Y.E. Nesterov2
1ICTEAM
Universit´e Catholique de Louvain
2CORE
Universit´e Catholique de Louvain
8 April 2011
2. Community Detection Resolution Limit Definition of resolution-free Results
Outline
1 Community Detection
2 Resolution Limit
3 Definition of resolution-free
4 Results
3. Community Detection Resolution Limit Definition of resolution-free Results
Community Detection
• Detect ‘natural’ communities in network.
• Modularity approach: ‘relatively’ many links inside communities
4. Community Detection Resolution Limit Definition of resolution-free Results
Community Detection (formal)
• In general, commmunities should have relatively
many present links (benefit),
few missing links (cost)
Minimize H = −
ij
(aij Aij − bij (1 − Aij ))δ(σi , σj ),
• Compare to random null-model pij (RB)
aij = wij − bij and bij = γRBpij
HRB = −
ij
(Aij wij − γRBpij )δ(σi , σj ).
• Modularity (NG): use configuration null model
pij =
ki kj
2m
.
Reichardt and Bornholdt. Phys Rev E (2006) 74:1,016110
Newman and Girvan. Phys Rev E (2004) 69:2,026113
5. Community Detection Resolution Limit Definition of resolution-free Results
Resolution limit
• Modularity might miss ‘small’ communities.
• Merge two cliques in ring of cliques when
γRB <
q
nc(nc − 1) + 2
.
• Depends on the total size of the graph.
• Number of communities scales as
√
γRBm.
• For general null model, problem remains
since ij pij = 2m.
Fortunato and Barthlemy PNAS (2007) 104:1, pp. 36
Kumpala et al. Eur Phys J B (2007) 56, pp. 41-45
6. Community Detection Resolution Limit Definition of resolution-free Results
Evading the resolution limit
• New model (RN) suggested
aij = wij
bij = γRN
HRN = −
ij
(Aij (wij + γRN) − γRN)δ(σi , σj ).
• Claim: no resolution limit, as merge depends only on ‘local’
variables
γRN <
1
n2
c − 1
.
• But, take pij = ki kj (rescale γRB by 2m), we obtain
γRB <
1
2(nc(nc − 1) + 2)2
,
also only ‘local’ variables. Hence, also no resolution limit?
Ronhovde and Nussinov. Phys Rev E (2010) 81:4,046114.
7. Community Detection Resolution Limit Definition of resolution-free Results
Problems remain
Subgraph
• Assume pij = ki kj (rescale γRB by 2m)
• Then separate in large graph when γRB >
1
2(nc(nc − 1) + 2)2
• But merged in subgraph when γRB <
1
2(nc(nc − 1) + 1)2
8. Community Detection Resolution Limit Definition of resolution-free Results
Resolution limit revisited
Resolution-limit
Resolution-free
• Problem is not merging per s´e.
• Rather, cliques separate in subgraph, but merge in large graph
(or vice versa).
• Suggests following definition.
9. Community Detection Resolution Limit Definition of resolution-free Results
Resolution limit revisited
Resolution-limit
Resolution-free
Definition (Resolution-free)
Objective function H is called resolution-free if, whenever partition
C optimal for G, then subpartition D ⊂ C also optimal for
subgraph H(D) ⊂ G induced by D.
10. Community Detection Resolution Limit Definition of resolution-free Results
Defining resolution-free
Definition (Resolution-free)
Objective function H is called resolution-free if, whenever partition
C optimal for G, then subpartition D ⊂ C also optimal for
subgraph H(D) ⊂ G induced by D.
• Implicitly defines resolution limit: method is not resolution-free.
• Some nice properties of resolution-free methods:
Replace optimal subpartitions
Never split cliques (unless in single nodes)
Main questions
• Do such methods exist?
• What conditions to impose?
11. Community Detection Resolution Limit Definition of resolution-free Results
General framework
General community detection
H = −
ij
(aij Aij − bij (1 − Aij ))δ(σi , σj ),
RB model Set aij = wij − bij , bij = γRBpij .
RN model Set aij = wij , bij = γRN.
Simpler alternative
CPM Set aij = wij − bij and bij = γ. Leads to
H = −
ij
(Aij wij − γ)δ(σi , σj ).
Clear interpretation: γ is minimum density of a community
H = −
c
ec − γn2
c.
12. Community Detection Resolution Limit Definition of resolution-free Results
Main result
Do resolution-free methods exists?
Yes: Both RN and CPM are resolution-free, results from general
theorem.
What conditions to impose?
Sufficient condition: aij and bij should be ‘local’.
Definition (Local weights)
Weights aij , bij called local whenever for every subgraph H ⊂ G,
weights remain similar, i.e. aij (G) ∼ aij (H) and bij (G) ∼ bij (H).
• Implies local weigths aij and bij can only depend on node i and
node j, nothing further.
• RN and CPM use local weights, hence resolution-free.
• Not necessary condition, but seem to be few exceptions.
• So, RN and CPM (almost) only sensible definitions.
13. Community Detection Resolution Limit Definition of resolution-free Results
Performance (directed networks)
µ0 0.2 0.4 0.6 0.8 1.0
NMI
0.25
0.5
0.75
1
CPM Infomap Modularity ER
n = 103
n = 104
14. Community Detection Resolution Limit Definition of resolution-free Results
Conclusions
• Provided definition of resolution-free.
• Methods using local weights are resolution-free.
• Clarifies link between ‘local’ methods and resolution limit.
• Only few resolution-free methods.
• Tested CPM, performs superbly.
Thank you for your attention.
Questions?
Traag, Van Dooren and Nesterov arXiv:1104.3083v1