1. Higher-order
clustering coefficients
Austin R. Benson
Cornell University
SIAM Network Science
July 13, 2017
Joint work with
Hao Yin, Stanford
Jure Leskovec, Stanford
David Gleich, Purdue
Slides bit.ly/austin-ns17
2. u
The clustering coefficient is
the fundamental measurement of network science
2
? C(u) = fraction of length-2 paths
centered at node u that form a triangle.
average clustering coefficient
C = average C(u) over all nodes u.
§ In real-world networks, C is larger than we would expect (there is clustering).
[Watts-Strogatz 1998] > 33k citations!
§ Attributed to triadic closure in sociology – a common friend provides an
opportunity for more friendships. [Simmel 1908, Rapoport 1953, Granovetter 1973]
§ Key property for generative models [Newman 2009, Seshadhri-Kolda-Pinar 2012]
§ Used as a feature for node role discovery [Henderson+ 2012]
§ Predictor of mental health [Bearman-Moody 2004]
-
-
u
3. 3
The clustering coefficient is inherently limited as it
measures the closure probability of just one simple
structure—the triangle.
And there is lots of evidence that dense “higher-order structure”
between > 3 nodes are also important for clustering.
§ 4-cliques reveal community structure in word association and
PPI networks [Palla+ 2005]
§ 4- and 5-cliques (+ other motifs/graphlets) used to identify
network type and dimension [Yaveroğlu+ 2014, Bonato+ 2014]
§ 4-node motifs identify community structure in neural systems
[Benson-Gleich-Leskovec 2016]
4. 4
Triangles tell just one part of the story.
How can we measure
higher-order (clique) closure patterns?
5. Our higher-order view of clustering coefficients
1. Find a 2-clique 2. Attach adjacent edge 3. Check for (2 + 1)-clique
1. Find a 3-clique 2. Attach adjacent edge 3. Check for (3+1)-clique
1. Find a 4-clique 2. Attach adjacent edge 3. Check for (4+1)-clique
5
C2 = avg. fraction of (2-clique, adjacent edge) pairs that induce a (2+1)-clique
Increase clique size by 1 to get a higher-order clustering coefficient.
C3 = avg. fraction of (3-clique, adjacent edge) pairs that induce a (3+1)-clique
C4 = avg. fraction of (4-clique, adjacent edge) pairs that induce a (4+1)-clique
-
-
- u
uuu
uu
uuu
6. Alice
Bob
Charlie1. Start with a
group of 3 friends
2. One person in
the group befriends
someone new
3. The group might
increase in size
Dave
6
rollingstone.com
oprah.com
A plausible example of higher-order closure
7. 7
We generalize clustering coefficients to account for clique closure.
This particular generalization has several advantages…
1. Easy to analyze relationships between clustering at different orders.
results for small-world and Gn,p models as well as general analysis
2. New insights into data
old idea pretty much all real-world networks exhibit clustering.
new idea real-world networks may only cluster up to a certain order.
3. Can relate clustering coefficients to existence of communities
Large higher-order clustering coefficient → can find a good “higher-order community”
Overview Higher-order clustering coefficients
uuu
9. 9
Third-order local
clustering coefficient
at node u.
u
u
#
#
Third-order global
clustering coefficient.
Third-order average
clustering coefficient.
u
u
#
#
¯C3 =
C3(u) =
C3 =
u
u
#
#
1
n
P
u
P
u
P
u
= 1
n
P
u C3(u)
Local, average, and global HOCCs
10. Analysis for small-world networks
10
Ring network
n nodes,
edges to 2k neighbors
edge rewiring probability p
n = 16
k = 3
p = 0
[Watts-Strogatz 1998]
With p = 0:0 and k < cn,
as k; n → ∞,
10-3
10-2
10-1
100
Rewiring probability (p)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Avg.clust.coeff.
C2-
10-3
10-2
10-1
100
Rewiring probability (p)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Avg.clust.coeff.
C2-
C3
C4
-
-
Proposition
[Yin-Benson-Leskovec 2017]
¯C2 →
3
4
¯Cr →
1
2
+
1
2r
11. Analysis for Gn,p networks
11
E [Cr (u) | C2(u)] = (C2(u))r−1
+ O(1/d2
u )
E [Cr ] = E
ˆ
¯Cr
˜
= E [Cr (u)] = pr−1
Proposition [Yin-Benson-Leskovec 2017]
Everything scales exponentially in the order of the cluster coefficient...
Even if a node’s neighborhood is dense, i.e., large C2(u),
higher-order clustering still decays exponentially.
12. Analysis for general networks
12
Proposition For any node u in any network,
(tight upper and lower bounds)
0 ≤ C3(u) ≤
p
C2(u)
u uu
C2(u) = 1
C3(u) = 1
C2(u) ≈ 1/4
C3(u) = 1/2
C2(u) ≈ 1/2
C3(u) = 0
Observation Cr (u) =
rKr+1(u)
(du − r + 1)Kr (u)
(says how to compute HOCCs by enumerating r- and (r + 1)-cliques.)
Ka(u) is the number of a-cliques
containing u
14. 14
Neural connections (C. elegans) 0.31 0.14 0.06
Random configurations 0.15 0.04 0.01
Random configurations (C2 fixed) 0.31 0.17 0.09
Facebook friendships (Stanford3) 0.25 0.18 0.16
Random configurations 0.03 0.00 0.00
Random configurations (C2 fixed) 0.25 0.14 0.09
Co-authorship (ca-AstroPh) 0.68 0.61 0.56
Random configurations 0.01 0.00 0.00
Random configurations (C2 fixed) 0.68 0.60 0.52
¯C2
¯C3
¯C4
Average HOCCs
-
-
-
uuu
15. 15
Neural
connections Random configurations
[Bollobás 1980, Milo 2003]
Random configurations
with C2 fixed
[Park-Newman 2004,
Colomer de Simón+ 2013]
Real network (C. elegans)
u ¯C3
Concentration in random samples
-
16. 16
C2(u)
C3(u)
Neural connections
Gn,p baseline
Upper bound
Facebook friendships Co-authorships
Dense but nearly
random regions
Dense and
structured regions
• Real network
• Random configuration with C2 fixed
Local HOCCs
-
u
u
17. 17
Neural connections 0.18 0.08 0.06 decreases with order
Facebook friendships 0.16 0.11 0.12 decreases and increases
Co-authorship 0.32 0.33 0.36 increases with order
Global HOCCs
C2 C3 C4
The global HOCCs tell us something about the existence of communities…
u
High-degree nodes in co-authorship exhibit
clique + star structure where C3(u) > C2(u).
19. 19
If a network has
a large higher-order clustering coefficient,
then it has communities.
then there exists at least one community
by one particular measure of “higher-order community structure”,
but we can find the community efficiently.
20. The conductance of a set of vertices S is the ratio of
edges leaving to edge end points in S.
small conductance ó good “community”
(edges leaving S)
(edge end points in S)
20
S
Background graph communities and conductance
S
cut(S) = 7
vol(S) = 85
(S) = 7/85
(S) =
cut(S)
vol(S)
21. Background motif conductance generalizes conductance to
higher-order structures like cliques [Benson-Gleich-Leskovec 2016]
Uses higher-order notions of cut and volume
21
vol(S) = #(edge end points in S)
cut(S) = #(edges cut) cutM (S) = #(motifs cut)
volM (S) = #(motif end points in S)
S
S
S
S S
(S) =
cut(S)
vol(S) M (S) =
cutM (S)
volM (S)
22. 22
Higher-order clustering → a good higher-order community
Easy to see that if Cr = 1,
then the network is a union
of disjoint cliques…
… any of these cliques has
optimal motif conductance = 0
Theorem [Yin-Benson-Leskovec-Gleich 2017]
There exists some node u whose 1-hop neighborhood N1(u) satisfies
where M is the r-clique motif, f is monotonically decreasing,
and f(Cr) → 0 and Cr → 1
φM (N1(u)) ≤ f(Cr )
N1(u)
This generalizes and improves a similar r = 2 result [Gleich-Seshadhri 12]
u
u
23. 23
1-hop neighborhoods and higher-order communities
Neural connections Facebook friendships Co-authorships
Neighborhood
Neighborhood with smallest conductance
Fiedler cut with motif normalized Laplacian
[Benson-Gleich-Leskovec 16]
Large C3 and several neighborhoods
with small triangle conductance
24. 24
Related work
§ Gleich and Seshadrhi, “Vertex neighborhoods, low conductance cuts, and good seeds for local
community methods”, KDD, 2012.
Motivation for relating higher-order clustering coefficients to 1-hop neighborhood communities.
Intellectually indebted for their proof techniques!
§ Benson, Gleich, and Leskovec, “Higher-order organization of complex networks,” Science, 2016.
Introduced higher-order conductance and a spectral method for optimizing it.
§ Fronczak et al., “Higher order clustering coefficients in Barabási–Albert networks.” Physica A, 2002.
Higher-order clustering by looking at shortest path lengths.
§ Jiang and Claramunt, “Topological analysis of urban street networks,” Environ. and Planning B, 2004.
Higher-order clustering by looking for triangles in k-hop neighborhoods.
§ Lambiotte et al., “Structural Transitions in Densifying Networks,” Physical Review Letters, 2016.
§ Bhat et al., “Densification and structural transitions in networks that grow by node copying,” Physical
Review E, 2016.
Generative models with similar clique closure ideas.
25. 25
§ Higher-order clustering in networks.
Yin, Benson, and Leskovec.
arXiv:1704.03913, 2017
§ Local higher-order graph clustering.
Yin, Benson, Leskovec, and Gleich.
To appear at KDD, 2017.
http://cs.cornell.edu/~arb
@austinbenson
arb@cs.cornell.edu
Thanks!
Austin R. Benson
1. A generalization of the fundamental measurement of
network science through “clique expansion” interpretation.
2. Able to analyze generally and in common random graph
models (small-world and Gn,p).
3. Old idea all real-world graphs cluster.
→ New idea only cluster up to a certain order.
4. In data, helps distinguish between dense and random
(neural connections) and dense and structured (FB
friendships, co-authorship).
5. Global higher-order clustering leads to higher-order 1-hop
neighborhood communities.
Papers
Higher-order clustering coefficients