Higher-order clustering coefficients generalize the clustering coefficient to capture clustering with respect to larger cliques (denser subgraphs) beyond triangles. The speaker defines higher-order clustering coefficients as the fraction of (r-1)-cliques paired with an adjacent edge that induce an r-clique. These coefficients reveal that real-world networks exhibit clustering to different orders and provide additional insights into network structure compared to only considering triangles. The coefficients also vary across networks such as neural, social, and collaboration networks in ways not explained by random graph models.
1. Higher-order clustering in
networks
Austin R. Benson · Cornell
HONS 2018
June 8, 2018 · Paris, France
HONS'18Austin R. Benson 1
Joint work with
Hao Yin · Stanford
Jure Leskovec ·
slides ⟶ bit.ly/arb-HONS-18 code ⟶ github.com/arbenson/HigherOrderClustering.jl
2. Many networks are globally sparse but locally
dense.
HONS'18Austin R. Benson 2
Coauthorship network
Brain network
Sporns and
Bullmore, Nature
Rev. Neuro., 2012
Networks for real-world systems have modules, clusters, communities.
[Watts-Strogatz 98; Flake 00; Newman 04, 06; many others…]
4. The clustering coefficient is a fundamental
measure in network science about how much a
network clusters.
HONS'18Austin R. Benson 4
?
C(u) = fraction of length-2 paths centered at node u
that form a triangle.
Average clustering coefficient C = mean of C(u).
• Data insights. Average clustering coefficient is larger than we would expect.
[Watts-Strogatz 98] > 36k citations!
• Domain phenomenon. Triadic closure in sociology.
[Simmel 1908; Rapoport 53; Granovetter 73]
• Statistical Feature. Role discovery, anomaly detection, mental health study.
[Henderson+ 12; La Fond+ 14, 16; Bearman-Moody 2004]
• Modeling tool. Key property for generative models.
[Newman 09; Seshadhri-Kolda-Pinar 12; Roble+ 16]
-
5. Higher-order clustering coefficients are limited.
HONS'18Austin R. Benson 5
The clustering coefficient measures the closure
probability of just one simple structure—the triangle.
• 4-cliques reveal community structure in word association and PPI networks [Palla+ 05]
• 4-/5-cliques (+ other structure) identify network type & dimension [Yaveroğlu+ 14, Bonato+ 14
• 4-node motifs identify community structure in neural systems [Benson-Gleich-Leskovec 16]
… but there is lots of evidence that dense “higher-
order structure” between > 3 nodes are also
important for clustering.
6. We will show that triangles are insufficient to
explain clustering. We need larger cliques.
HONS'18Austin R. Benson 6
• old idea ⟶ pretty much all real-world networks exhibit
clustering.
• new idea ⟶ networks may only cluster up to a certain “order”.
7. HONS'18Austin R. Benson 7
Triangles tell just one part of the story.
How do we measure clustering with
respect to higher-order (clique) closure?
8. 1. Find a 2-clique 2. Attach adjacent edge 3. Check for (2+1)-clique
1. Find a 3-clique 2. Attach adjacent edge 3. Check for (3+1)-clique
1. Find a 4-clique 2. Attach adjacent edge 3. Check for (4+1)-
clique
8
C2 = avg. fraction of (2-clique, adjacent edge) pairs that induce a (2+1)-clique.
Increase clique size by 1 to get a higher-order clustering coefficient!
C3 = avg. fraction of (3-clique, adjacent edge) pairs that induce a (3+1)-clique.
C4 = avg. fraction of (4-clique, adjacent edge) pairs that induce a (4+1)-clique.
-
-
-
We view clustering as a clique expansion process.
HONS'18Austin R. Benson
9. 9
We can think of higher-order closure processes in
everyday life.
HONS'18Austin R. Benson
Alice
Bob
Charlie
1. Start with a group
of 3 friends.
2. One person in the
group befriends
someone new.
3.The group might
increase in size.
Dave
rollingstone.com
oprah.com
10. 10
Higher-order clustering coefficients offer
several advantages.
HONS'18Austin R. Benson
Theory & analysis.
• Better understanding of small-world and Gn,p random graph models.
• Extremal combinatorics for general graphs.
Data Insights.
• old idea ⟶ pretty much all real-world networks exhibit clustering.
• new idea ⟶ real-world networks may only cluster up to a certain order.
order.
11. 11
Background.
Local, average, and global clustering coefficients.
HONS'18Austin R. Benson
Second-order (classical)
local clustering
coefficient at node u.
Second-order (classical)
global clustering coefficient.
Second-order (classical)
average clustering
coefficient.
#
#
#
#
#
#
12. 12
Higher-order (third-order)
local, average, and global clustering coefficients.
HONS'18Austin R. Benson
Third-order
local clustering
coefficient at node u.
Third-order
global clustering coefficient.
Third-order
average clustering
coefficient.
#
#
#
#
#
#
13. Theorem [Watts-Strogatz 98]
13
We can analyze higher-order clustering with
small-world models.
HONS'18Austin R. Benson
• Start with n nodes and edges to 2k neighbors
and then rewire each edge with probability p.
n = 16
k = 3
p = 0
[Yin-Benson-Leskovec 18]
[Watts-Strogatz 98]
14. 14
We can also analyze higher-order clustering in
Gn,p.
HONS'18Austin R. Benson
Theorem [Yin-Benson-Leskovec 18]
Everything scales exponentially in the order of the cluster coefficient...
Even if a node’s neighborhood is dense, i.e., C2(u) is large,
higher-order clustering still decays exponentially in Gn,p.
15. 15
Extremal combinatorics show relationships
between clustering coefficients of different orders.
HONS'18Austin R. Benson
Theorem [Yin-Benson-Leskovec 18]
16. Local higher-order clustering coefficients
hierarchically capture clique density in a node’s
neighborhood.
HONS'18Austin R. Benson 16
Theorem [Yin-Benson-Leskovec 18]
The product of the first r - 1 local higher-order clustering coefficients is
the r-clique density between the neighbors of node u.
17. Computation only requires clique participation
counts.
HONS'18Austin R. Benson 17
We can compute the rth-order HOCCs by
enumerating r- and (r + 1)-cliques.
Ka(u) is the number of a-
cliques containing u.
18. 18
Higher-order clustering coefficients offer
several advantages.
HONS'18Austin R. Benson
Theory & analysis.
• Better understanding of small-world and Gn,p random graph models.
• Extremal combinatorics for general graphs.
Data Insights.
• old idea ⟶ pretty much all real-world networks exhibit clustering.
• new idea ⟶ real-world networks may only cluster up to a certain order.
order.
20. Global clustering patterns varies widely across
datasets.
HONS'18Austin R. Benson 20
Neural connections
Facebook friendships
Coauthorships
Not obviously due to cliques in coauthorship!
High-degree nodes in co-authorships exhibit
clique + star structure where C3(u) > C2(u).
0.32 0.33 0.36 increases with order
0.16 0.11 0.12 decreases and increases
0.18 0.08 0.06 decreases with order
21. Average higher-order clustering also varies widely.
HONS'18Austin R. Benson 21
Neural connections 0.31 0.14
Random configurations 0.15 0.04
Random configurations (C2 fixed). 0.31 0.17
Facebook friendships 0.25 0.18
Random configurations 0.03 0.00
Random configurations (C2 fixed) 0.25 0.14
Coauthorships 0.68 0.61
Random configurations 0.01 0.00
Random configurations (C2 fixed). 0.68 0.60-
-
-
statistically
significantly
less
clustering
statistically
significantly
more clustering
Not significantly
different
clustering
(using sampling tools from [Bollobás 1980; Milo+ 03; Park-Newman 04; Colomer de Simón+ 13])
22. Random samples concentrate in neural
connections data.
HONS'18Austin R. Benson 22
Random configurations
[Bollobás 1980; Milo 2003]
Random configurations
with C2 fixed
[Park-Newman 2004;
Colomer de Simón+ 2013]
Real network (C. elegans)
-
23. Clustering in neural connections not just due to
cliques.
HONS'18Austin R. Benson 23
Original network Null model
# 4-cliques 2,010 440 ± 68
C3 0.14 0.17 ± 0.004
4-clique count decreases in the null model, but the
higher-order clustering coefficient increases.
-
Key reason. Clustering coefficients are
normalized by opportunities to cluster.
24. Changes in higher-order clustering tend to be
independent of the degree.
HONS'18Austin R. Benson 24
Neural connections Facebook friendships Coauthorships
25. HONS'18Austin R. Benson 25
Local higher-order clustering gives a more nuanced
view.
Neural connections
Gn,p baseline
Upper bound
Facebook friendships Coauthorships
Dense but nearly
random regions
Dense and
structured
regions
• Actual network data
• Random configuration with C2 fixed
-
Hitting
upper bound
26. HONS'18Austin R. Benson 26
Email Autonomous systems
Average third-order
clusteringNot significantly
different
clustering
statistically
significantly
more clustering
27. We should keep higher-order clustering in mind
when mining and modeling network data.
HONS'18Austin R. Benson 27
1. Only using triangles gives a misleading notion of clustering.
Some networks do not even exhibit clustering w/r/t larger cliques!
→ Are there models that capture higher-order clustering statistics?
2. Higher-order clustering coefficients and closure coefficients offer
additional measures of network clustering.
→We should plug these features into ML pipelines for network data.
3. We examined higher-order structure from dyadic data.
→What happens if we use hypergraph data?
28. Higher-order clustering in networks.
Thanks for your attention!
HONS'18Austin R. Benson 28
Austin R. Benson
http://cs.cornell.edu/~arb
@austinbenson
arb@cs.cornell.edu
Yin, Benson, and Leskovec. Higher-order clustering in networks. Physical Review E,
2018.
Code. github.com/arbenson/HigherOrderClustering.jl
Slides. bit.ly/arb-HONS-18