SlideShare a Scribd company logo
1 of 32
Download to read offline
Vertex Neighborhoods, !
Low Conductance Cuts, !
and Good Seeds for Local
Community Methods
DAVID F. GLEICH
PURDUE
C. SESHADHRI
SANDIA - LIVERMORE
KDD2012
David Gleich · Purdue
Neighborhoods are good communities
Neighborhoods are good communities
^
conductance
^
Vertex
Neighborhoods are good communities
^
conductance
^
A Vertex
(4-4 𝜅)/(3-2 𝜅)
Neighborhoods are good communities
^
conductance
^
A Vertex
(4-4 𝜅)/(3-2 𝜅)
 where 𝜅 is
the clustering
coefficient
and 
the graph has a heavy tailed degree distribution
Neighborhoods are good communities
^
conductance
^
A Vertex
(4-4 𝜅)/(3-2 𝜅)
 where 𝜅 is
the clustering
coefficient
and 
the graph has a heavy tailed degree distribution
is a 
 y
A vertex neighborhood is a
“good” conductance community
in a graph with a heavy-tailed
degree distribution and large
clustering coefficient.
Our contributions
1.  The previous theorem and its proof. This
shows that good communities are expected
and easy to find in modern networks with
heavy-tailed degrees and large clustering.
2.  An empirical evaluation of neighborhood
communities that shows vertex
neighborhoods are the “backbone” of the
network community profile.
KDD2012
David Gleich · Purdue
Formal background for the theorem
1.  Vertex neighborhoods
2.  Low conductance cuts
3.  Clustering coefficients
KDD2012
David Gleich · Purdue
Vertex neighborhoods
The set of a vertex and"
all its neighborhood 
Also called an “egonet”

Prior research on egonets of social networks from
the “structural holes” perspective [Burt95,Kleinberg08]. 
Used for anomaly detection [Akoglu10], "
community seeds [Huang11,Schaeffer11], "
overlapping communities [Schaeffer07,Rees10]. 

 KDD2012
David Gleich · Purdue
Conductance communities
Conductance is one of the most
important community scores [Schaeffer07]
The conductance of a set of vertices is
the ratio of edges leaving to total edges:


Equivalently, it’s the probability that a
random edge leaves the set.
Small conductance ó Good community
(S) =
cut(S)
min vol(S), vol( ¯S)
(edges leaving the set)
(total edges
in the set)
KDD2012
David Gleich · Purdue
cut(S) = 7
vol(S) = 33
vol( ¯S) = 11
(S) = 7/11
Clustering coefficients

Wedge

Global clustering coefficient



 =
number of closed wedges
number of wedges
center of wedge
closed wedge
Probability that a
random wedge
is closed
KDD2012
David Gleich · Purdue
Simple version of theorem
If global clustering coefficient = 1, then "
the graph is a disjoint union of cliques.
Vertex neighborhoods are optimal communities!




KDD2012
David Gleich · Purdue
Theorem
Condition Let graph G have
clustering coefficient 𝜅 and "
have vertex degrees bounded "
by a power-law function with
exponent 𝛾 less than 3.

Theorem Then there exists a vertex
neighborhood with conductance 
log degree
logprobability
↵1n/d
↵2n/d
 4(1 )/(3 2)
KDD2012
David Gleich · Purdue
Proof Sketch

1) Large clustering coefficient "
⇒ many wedges are closed
2) Heavy tailed degree dist "
⇒ a few vertices have a very large degree
3) Large degree ⇒ O(d 2) wedges ⇒ “most” of wedges 

Thus, there must exist a vertex with a high edge density ⇒
“good” conductance 
Use the probabilistic method to formalize
10
0
10
1
10
2
10
3
10
4
0
0.2
0.4
0.6
0.8
1
CDFofNumberofWedges
Degree
KDD2012
David Gleich · Purdue
Confession!
The theory is weak



(S)  4(1 )/(3 2)
Collaboration
networks "
𝜅 ~ [0.1 – 0.5]
Social networks "
𝜅 ~ [0.05 – 0.1]
Graph Verts Edges Avg.
Deg.
Max
Deg.
 ¯C
ca-AstroPh 17903 196972 22.0 504 0.318 0.633
email-Enron 33696 180811 10.7 1383 0.085 0.509
cond-mat-2005 36458 171735 9.4 278 0.243 0.657
arxiv 86376 517563 12.0 1253 0.560 0.678
dblp 226413 716460 6.3 238 0.383 0.635
hollywood-2009 1069126 56306653 105.3 11467 0.310 0.766
fb-Penn94 41536 1362220 65.6 4410 0.098 0.212
fb-A-oneyear 1138557 4404989 7.7 695 0.038 0.060
fb-A 3097165 23667394 15.3 4915 0.048 0.097
soc-LiveJournal1 4843953 42845684 17.7 20333 0.118 0.274
oregon2-010526 11461 32730 5.7 2432 0.037 0.352
p2p-Gnutella25 22663 54693 4.8 66 0.005 0.005
as-22july06 22963 48436 4.2 2390 0.011 0.230
itdk0304 190914 607610 6.4 1071 0.061 0.158
Verts Edges Avg.
Deg.
Max
Deg.
 ¯C
17903 196972 22.0 504 0.318 0.633
n 33696 180811 10.7 1383 0.085 0.509
2005 36458 171735 9.4 278 0.243 0.657
86376 517563 12.0 1253 0.560 0.678
226413 716460 6.3 238 0.383 0.635
2009 1069126 56306653 105.3 11467 0.310 0.766
41536 1362220 65.6 4410 0.098 0.212
ar 1138557 4404989 7.7 695 0.038 0.060
3097165 23667394 15.3 4915 0.048 0.097
urnal1 4843953 42845684 17.7 20333 0.118 0.274
10526 11461 32730 5.7 2432 0.037 0.352
la25 22663 54693 4.8 66 0.005 0.005
6 22963 48436 4.2 2390 0.011 0.230
190914 607610 6.4 1071 0.061 0.158
Tech. networks "
𝜅 ~ [0.005 – 0.05]
This bound is useless
unless 𝜅 ≥ 1/2
KDD2012
David Gleich · Purdue
We view this theory as "
“intuition for the truth”
KDD2012
David Gleich · Purdue
Empirical Evaluation using
Network Community Profiles
la25 [27]) or as
304 [10]). The
the nodes.
and the edges
].
OD
ure to compute
he conductance
ost of the work
erformed when
We can express
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
1
10
−4
10
−3
10
0
10
1
1
10
−4
10
−3
fb-A-oneyear
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
soc-LiveJournal1 ca
10
0
10
0
10
0
10
0
Community Size
Minimum
conductance for
any community of
the given size
Canonical shape
found by
Leskovec, Lang,
Dasgupta, and
Mahoney

Holds for a variety
of approximations
to conductance.
KDD2012
David Gleich · Purdue
Empirical Evaluation using
Network Community Profiles
la25 [27]) or as
304 [10]). The
the nodes.
and the edges
].
OD
ure to compute
he conductance
ost of the work
erformed when
We can express
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
1
10
−4
10
−3
10
0
10
1
1
10
−4
10
−3
fb-A-oneyear
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
soc-LiveJournal1 ca
10
0
10
0
10
0
10
0
Community Size"
(Degree + 1)
Minimum
conductance for
any community
neighborhood of
the given size
“Egonet
community
profile” shows
the same
shape, 3 secs
to compute.
1.1M verts, 4M edges
The Fiedler
community
computed from
the normalized
Laplacian is a
neighborhood!
KDD2012
David Gleich · Purdue
Facebook data
from Wilson et
al. 2009
Not just one graph
10
5
10
5
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
max
deg
ver t s
2
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
max
deg
ver t s
2
arxiv
10
5
10
5
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
ver t s
2
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
ver t s
2
ca-AstroPh
10
0
10
0
t any procedure to compute
o compute the conductance
he graph. Most of the work
e cient is performed when
the vertex. We can express
s:
 {v})/2
ighbors produces a triangle
double-counts). Note also
dges(N1(v))/2 dv. Then
ges(N1(v)). And so, given
compute the cut given the
well. This is easy to do with
tly stores the degrees.
hood communities
network community plot to
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
max
deg
1
10
−4
10
−3
1
10
−4
10
−3
soc-LiveJournal1
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
1
10
−2
10
−1
10
0
1
10
−2
10
−1
10
0
Number of vertices in cluster N
Figure 2: The best neighbo
ductance at each size (black
arXiv – 86k verts, 500k edges
 soc-LiveJournal – 5M verts, 42M edges
15 more graphs available
www.cs.purdue.edu/~dgleich/codes/neighborhoods

 KDD2012
David Gleich · Purdue
Filling in the !
Network Community Profile
la25 [27]) or as
304 [10]). The
the nodes.
and the edges
].
OD
ure to compute
he conductance
ost of the work
erformed when
We can express
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
1
10
−4
10
−3
10
0
10
1
1
10
−4
10
−3
fb-A-oneyear
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
soc-LiveJournal1 ca
10
0
10
0
10
0
10
0
Minimum
conductance for
any community
neighborhood of
the given size
We are missing
a region of the
NCP when we
just look at
neighborhoods 
KDD2012
David Gleich · Purdue
Community Size"
(Degree + 1)
Personalized PageRank
Communities [Andersen06]
To find the canonical NCP structure, Leskovec et
al. used a personalized PageRank based
community finder. 
These start with a single vertex seed, and then
expand the community based on the solution of a
personalized PageRank problem.
The resulting community satisfies a local Cheeger
inequality.
This needs to run thousands of times for an NCP
KDD2012
David Gleich · Purdue
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
Filling in the !
Network Community Profile
la25 [27]) or as
304 [10]). The
the nodes.
and the edges
].
OD
ure to compute
he conductance
ost of the work
erformed when
We can express
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
1
10
−4
10
−3
10
0
10
1
1
10
−4
10
−3
fb-A-oneyear
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
soc-LiveJournal1 ca
10
0
10
0
10
0
10
0
Minimum
conductance for
any community of
the given size

7807 seconds
This region
fills when
using the
PPR method
(like now!)
KDD2012
David Gleich · Purdue
Community Size"
Vertex Neighborhoods, !
Low Conductance Cuts, !
and Good Seeds for Local
Community Methods
KDD2012
David Gleich · Purdue
Am I a good seed?!
Locally Minimal Communities
“My conductance is the best locally.”



(N(v))  (N(w))
for all w adjacent to v
In Zachary’s Karate Club
network, there are four locally
minimal communities, the two
leaders and two peripheral
nodes.
KDD2012
David Gleich · Purdue
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
Locally minimal communities
capture extremal neighborhoods
la25 [27]) or as
304 [10]). The
the nodes.
and the edges
].
OD
ure to compute
he conductance
ost of the work
erformed when
We can express
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
1
10
−4
10
−3
10
0
10
1
1
10
−4
10
−3
fb-A-oneyear
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
soc-LiveJournal1 ca
10
0
10
0
10
0
10
0
Red dots are
conductance "
and size of a "
locally minimal
community

Usually about 1%
of # of vertices.
The red
circles – the
best local
mins – find
the extremes
in the egonet
profile.
KDD2012
David Gleich · Purdue
Community Size"
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
Filling in the NCP!
Growing locally minimal comm.
la25 [27]) or as
304 [10]). The
the nodes.
and the edges
].
OD
ure to compute
he conductance
ost of the work
erformed when
We can express
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
1
10
−4
10
−3
10
0
10
1
1
10
−4
10
−3
fb-A-oneyear
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
soc-LiveJournal1 ca
10
0
10
0
10
0
10
0
Growing only
locally minimal
communities


283 seconds
vs.
7807 seconds
Full NCP
Locally min
NCP
Original
Egonet
KDD2012
David Gleich · Purdue
Community Size"
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
ver t s
2
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
ver t s
2
Filling in the NCP!
Growing locally minimal comm.
Growing only
locally minimal
communities 


143 seconds
vs.
2211 seconds
Full NCP
Locally min
NCP
Original
Egonets
arXiv – 86k verts, 500k edges
KDD2012
David Gleich · Purdue
Community Size"
Recap
A theorem relating clustering,"
heavy-tailed degrees, and"
low-conductance cuts of "
vertex neighborhoods.
Empirical evaluation of "
vertex neighborhoods.
More on k-cores in the paper.
⇒  Many communities are easy to find!
⇒  Explains success of community detection?
Acknowledgements!
David supported by NSF CAREER
award 1149756-CCF.
Sesh supported by the Sandia
LDRD program (project 158477) and
the applied mathematics program at
the Dept. of Energy.

KDD2012
David Gleich · Purdue
Code and results available online
www.cs.purdue.edu/~dgleich/
codes/neighborhoods
Two words on computing
Can be done by just
counting the triangles at
each node. Linear
complexity in |E| in a power-
law graph.

It’s possible to do this in
MapReduce too.
KDD2012
David Gleich · Purdue
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
ver t s
2
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
ver t s
2
Filling in the NCP!
Growing k cores
Growing only
locally minimal
communities 
and k-cores 

143 seconds
vs.
2211 seconds
Full NCP
Locally min
NCP
Original
Egonets
arXiv – 86k verts, 500k edges
KDD2012
David Gleich · Purdue
Community Size"

PPR grown
k-cores
k-cores
Clustering coefficients
Wedge

Global clustering coefficient


Local clustering coefficient

 =
number of closed wedges
number of wedges
Cv =
number of closed wedges centered at v
number of wedges centered at v
center of wedge
closed wedge
Probability that a
random wedge
is closed
KDD2012
David Gleich · Purdue

More Related Content

Viewers also liked

Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
 
A history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveA history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveDavid Gleich
 
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...David Gleich
 
Direct tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDirect tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDavid Gleich
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceDavid Gleich
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignmentDavid Gleich
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksDavid Gleich
 
Tall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesTall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesDavid Gleich
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisDavid Gleich
 
How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...David Gleich
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential David Gleich
 
A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationDavid Gleich
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networksDavid Gleich
 
MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisDavid Gleich
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detectionDavid Gleich
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphsDavid Gleich
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduceDavid Gleich
 
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph AlgorithmsMassive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph AlgorithmsDavid Gleich
 

Viewers also liked (20)

Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
A history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveA history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspective
 
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
 
Direct tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDirect tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architectures
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduce
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignment
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
Tall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesTall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architectures
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportation
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 
MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysis
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
 
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph AlgorithmsMassive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
 

Similar to Vertex neighborhoods, low conductance cuts, and good seeds for local community methods

Higher-order clustering coefficients
Higher-order clustering coefficientsHigher-order clustering coefficients
Higher-order clustering coefficientsAustin Benson
 
advanced engineering mathematics-erwin kreyszig.pdf
advanced engineering mathematics-erwin kreyszig.pdfadvanced engineering mathematics-erwin kreyszig.pdf
advanced engineering mathematics-erwin kreyszig.pdfcibeyo cibeyo
 
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in RFinding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in RRevolution Analytics
 
Electronic spectra problems
Electronic spectra problemsElectronic spectra problems
Electronic spectra problemsSANTHANAM V
 
Analysis of convection diffusion problems at various peclet numbers using fin...
Analysis of convection diffusion problems at various peclet numbers using fin...Analysis of convection diffusion problems at various peclet numbers using fin...
Analysis of convection diffusion problems at various peclet numbers using fin...Alexander Decker
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresDavid Gleich
 
Numerical and analytical studies of single and multiphase starting jets and p...
Numerical and analytical studies of single and multiphase starting jets and p...Numerical and analytical studies of single and multiphase starting jets and p...
Numerical and analytical studies of single and multiphase starting jets and p...Ruo-Qian (Roger) Wang
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansDavid Gleich
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsNesreen K. Ahmed
 
Microwave-Circuit-Design.pptx
Microwave-Circuit-Design.pptxMicrowave-Circuit-Design.pptx
Microwave-Circuit-Design.pptxSKILL2021
 
Subthreshold swing model using scale length for sub-10 nm junction-based doub...
Subthreshold swing model using scale length for sub-10 nm junction-based doub...Subthreshold swing model using scale length for sub-10 nm junction-based doub...
Subthreshold swing model using scale length for sub-10 nm junction-based doub...IJECEIAES
 
Senior Research paper
Senior Research paperSenior Research paper
Senior Research paperEvan Foley
 
Data sparse approximation of the Karhunen-Loeve expansion
Data sparse approximation of the Karhunen-Loeve expansionData sparse approximation of the Karhunen-Loeve expansion
Data sparse approximation of the Karhunen-Loeve expansionAlexander Litvinenko
 

Similar to Vertex neighborhoods, low conductance cuts, and good seeds for local community methods (20)

Higher-order clustering coefficients
Higher-order clustering coefficientsHigher-order clustering coefficients
Higher-order clustering coefficients
 
advanced engineering mathematics-erwin kreyszig.pdf
advanced engineering mathematics-erwin kreyszig.pdfadvanced engineering mathematics-erwin kreyszig.pdf
advanced engineering mathematics-erwin kreyszig.pdf
 
06 Filtration.ppt
06 Filtration.ppt06 Filtration.ppt
06 Filtration.ppt
 
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in RFinding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
 
Electronic spectra problems
Electronic spectra problemsElectronic spectra problems
Electronic spectra problems
 
Interactive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social GraphsInteractive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social Graphs
 
Network analysis
Network analysisNetwork analysis
Network analysis
 
Analysis of convection diffusion problems at various peclet numbers using fin...
Analysis of convection diffusion problems at various peclet numbers using fin...Analysis of convection diffusion problems at various peclet numbers using fin...
Analysis of convection diffusion problems at various peclet numbers using fin...
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Numerical and analytical studies of single and multiphase starting jets and p...
Numerical and analytical studies of single and multiphase starting jets and p...Numerical and analytical studies of single and multiphase starting jets and p...
Numerical and analytical studies of single and multiphase starting jets and p...
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph Analytics
 
Microwave-Circuit-Design.pptx
Microwave-Circuit-Design.pptxMicrowave-Circuit-Design.pptx
Microwave-Circuit-Design.pptx
 
Design of Downstream Blanket
Design of Downstream BlanketDesign of Downstream Blanket
Design of Downstream Blanket
 
Subthreshold swing model using scale length for sub-10 nm junction-based doub...
Subthreshold swing model using scale length for sub-10 nm junction-based doub...Subthreshold swing model using scale length for sub-10 nm junction-based doub...
Subthreshold swing model using scale length for sub-10 nm junction-based doub...
 
Bi4301333340
Bi4301333340Bi4301333340
Bi4301333340
 
rgDefense
rgDefensergDefense
rgDefense
 
Senior Research paper
Senior Research paperSenior Research paper
Senior Research paper
 
Data sparse approximation of the Karhunen-Loeve expansion
Data sparse approximation of the Karhunen-Loeve expansionData sparse approximation of the Karhunen-Loeve expansion
Data sparse approximation of the Karhunen-Loeve expansion
 
Slides
SlidesSlides
Slides
 

More from David Gleich

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksDavid Gleich
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsDavid Gleich
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph miningDavid Gleich
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresDavid Gleich
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structuresDavid Gleich
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
 
Matrix methods for Hadoop
Matrix methods for HadoopMatrix methods for Hadoop
Matrix methods for HadoopDavid Gleich
 

More from David Gleich (9)

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and more
 
Matrix methods for Hadoop
Matrix methods for HadoopMatrix methods for Hadoop
Matrix methods for Hadoop
 

Recently uploaded

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 

Recently uploaded (20)

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 

Vertex neighborhoods, low conductance cuts, and good seeds for local community methods

  • 1. Vertex Neighborhoods, ! Low Conductance Cuts, ! and Good Seeds for Local Community Methods DAVID F. GLEICH PURDUE C. SESHADHRI SANDIA - LIVERMORE KDD2012 David Gleich · Purdue
  • 3. Neighborhoods are good communities ^ conductance ^ Vertex
  • 4. Neighborhoods are good communities ^ conductance ^ A Vertex (4-4 𝜅)/(3-2 𝜅)
  • 5. Neighborhoods are good communities ^ conductance ^ A Vertex (4-4 𝜅)/(3-2 𝜅) where 𝜅 is the clustering coefficient and the graph has a heavy tailed degree distribution
  • 6. Neighborhoods are good communities ^ conductance ^ A Vertex (4-4 𝜅)/(3-2 𝜅) where 𝜅 is the clustering coefficient and the graph has a heavy tailed degree distribution is a y
  • 7. A vertex neighborhood is a “good” conductance community in a graph with a heavy-tailed degree distribution and large clustering coefficient.
  • 8. Our contributions 1.  The previous theorem and its proof. This shows that good communities are expected and easy to find in modern networks with heavy-tailed degrees and large clustering. 2.  An empirical evaluation of neighborhood communities that shows vertex neighborhoods are the “backbone” of the network community profile. KDD2012 David Gleich · Purdue
  • 9. Formal background for the theorem 1.  Vertex neighborhoods 2.  Low conductance cuts 3.  Clustering coefficients KDD2012 David Gleich · Purdue
  • 10. Vertex neighborhoods The set of a vertex and" all its neighborhood Also called an “egonet” Prior research on egonets of social networks from the “structural holes” perspective [Burt95,Kleinberg08]. Used for anomaly detection [Akoglu10], " community seeds [Huang11,Schaeffer11], " overlapping communities [Schaeffer07,Rees10]. KDD2012 David Gleich · Purdue
  • 11. Conductance communities Conductance is one of the most important community scores [Schaeffer07] The conductance of a set of vertices is the ratio of edges leaving to total edges: Equivalently, it’s the probability that a random edge leaves the set. Small conductance ó Good community (S) = cut(S) min vol(S), vol( ¯S) (edges leaving the set) (total edges in the set) KDD2012 David Gleich · Purdue cut(S) = 7 vol(S) = 33 vol( ¯S) = 11 (S) = 7/11
  • 12. Clustering coefficients Wedge Global clustering coefficient  = number of closed wedges number of wedges center of wedge closed wedge Probability that a random wedge is closed KDD2012 David Gleich · Purdue
  • 13. Simple version of theorem If global clustering coefficient = 1, then " the graph is a disjoint union of cliques. Vertex neighborhoods are optimal communities! KDD2012 David Gleich · Purdue
  • 14. Theorem Condition Let graph G have clustering coefficient 𝜅 and " have vertex degrees bounded " by a power-law function with exponent 𝛾 less than 3. Theorem Then there exists a vertex neighborhood with conductance log degree logprobability ↵1n/d ↵2n/d  4(1 )/(3 2) KDD2012 David Gleich · Purdue
  • 15. Proof Sketch 1) Large clustering coefficient " ⇒ many wedges are closed 2) Heavy tailed degree dist " ⇒ a few vertices have a very large degree 3) Large degree ⇒ O(d 2) wedges ⇒ “most” of wedges Thus, there must exist a vertex with a high edge density ⇒ “good” conductance Use the probabilistic method to formalize 10 0 10 1 10 2 10 3 10 4 0 0.2 0.4 0.6 0.8 1 CDFofNumberofWedges Degree KDD2012 David Gleich · Purdue
  • 16. Confession! The theory is weak (S)  4(1 )/(3 2) Collaboration networks " 𝜅 ~ [0.1 – 0.5] Social networks " 𝜅 ~ [0.05 – 0.1] Graph Verts Edges Avg. Deg. Max Deg.  ¯C ca-AstroPh 17903 196972 22.0 504 0.318 0.633 email-Enron 33696 180811 10.7 1383 0.085 0.509 cond-mat-2005 36458 171735 9.4 278 0.243 0.657 arxiv 86376 517563 12.0 1253 0.560 0.678 dblp 226413 716460 6.3 238 0.383 0.635 hollywood-2009 1069126 56306653 105.3 11467 0.310 0.766 fb-Penn94 41536 1362220 65.6 4410 0.098 0.212 fb-A-oneyear 1138557 4404989 7.7 695 0.038 0.060 fb-A 3097165 23667394 15.3 4915 0.048 0.097 soc-LiveJournal1 4843953 42845684 17.7 20333 0.118 0.274 oregon2-010526 11461 32730 5.7 2432 0.037 0.352 p2p-Gnutella25 22663 54693 4.8 66 0.005 0.005 as-22july06 22963 48436 4.2 2390 0.011 0.230 itdk0304 190914 607610 6.4 1071 0.061 0.158 Verts Edges Avg. Deg. Max Deg.  ¯C 17903 196972 22.0 504 0.318 0.633 n 33696 180811 10.7 1383 0.085 0.509 2005 36458 171735 9.4 278 0.243 0.657 86376 517563 12.0 1253 0.560 0.678 226413 716460 6.3 238 0.383 0.635 2009 1069126 56306653 105.3 11467 0.310 0.766 41536 1362220 65.6 4410 0.098 0.212 ar 1138557 4404989 7.7 695 0.038 0.060 3097165 23667394 15.3 4915 0.048 0.097 urnal1 4843953 42845684 17.7 20333 0.118 0.274 10526 11461 32730 5.7 2432 0.037 0.352 la25 22663 54693 4.8 66 0.005 0.005 6 22963 48436 4.2 2390 0.011 0.230 190914 607610 6.4 1071 0.061 0.158 Tech. networks " 𝜅 ~ [0.005 – 0.05] This bound is useless unless 𝜅 ≥ 1/2 KDD2012 David Gleich · Purdue
  • 17. We view this theory as " “intuition for the truth” KDD2012 David Gleich · Purdue
  • 18. Empirical Evaluation using Network Community Profiles la25 [27]) or as 304 [10]). The the nodes. and the edges ]. OD ure to compute he conductance ost of the work erformed when We can express 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 max deg 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 max deg 10 0 10 1 1 10 −4 10 −3 10 0 10 1 1 10 −4 10 −3 fb-A-oneyear 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg 10 0 10 1 10 −4 10 −3 10 −2 10 −1 10 0 10 0 10 1 10 −4 10 −3 10 −2 10 −1 10 0 soc-LiveJournal1 ca 10 0 10 0 10 0 10 0 Community Size Minimum conductance for any community of the given size Canonical shape found by Leskovec, Lang, Dasgupta, and Mahoney Holds for a variety of approximations to conductance. KDD2012 David Gleich · Purdue
  • 19. Empirical Evaluation using Network Community Profiles la25 [27]) or as 304 [10]). The the nodes. and the edges ]. OD ure to compute he conductance ost of the work erformed when We can express 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 max deg 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 max deg 10 0 10 1 1 10 −4 10 −3 10 0 10 1 1 10 −4 10 −3 fb-A-oneyear 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg 10 0 10 1 10 −4 10 −3 10 −2 10 −1 10 0 10 0 10 1 10 −4 10 −3 10 −2 10 −1 10 0 soc-LiveJournal1 ca 10 0 10 0 10 0 10 0 Community Size" (Degree + 1) Minimum conductance for any community neighborhood of the given size “Egonet community profile” shows the same shape, 3 secs to compute. 1.1M verts, 4M edges The Fiedler community computed from the normalized Laplacian is a neighborhood! KDD2012 David Gleich · Purdue Facebook data from Wilson et al. 2009
  • 20. Not just one graph 10 5 10 5 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 max deg ver t s 2 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 max deg ver t s 2 arxiv 10 5 10 5 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg ver t s 2 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg ver t s 2 ca-AstroPh 10 0 10 0 t any procedure to compute o compute the conductance he graph. Most of the work e cient is performed when the vertex. We can express s: {v})/2 ighbors produces a triangle double-counts). Note also dges(N1(v))/2 dv. Then ges(N1(v)). And so, given compute the cut given the well. This is easy to do with tly stores the degrees. hood communities network community plot to 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 max deg 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 max deg 1 10 −4 10 −3 1 10 −4 10 −3 soc-LiveJournal1 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg 1 10 −2 10 −1 10 0 1 10 −2 10 −1 10 0 Number of vertices in cluster N Figure 2: The best neighbo ductance at each size (black arXiv – 86k verts, 500k edges soc-LiveJournal – 5M verts, 42M edges 15 more graphs available www.cs.purdue.edu/~dgleich/codes/neighborhoods KDD2012 David Gleich · Purdue
  • 21. Filling in the ! Network Community Profile la25 [27]) or as 304 [10]). The the nodes. and the edges ]. OD ure to compute he conductance ost of the work erformed when We can express 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 max deg 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 max deg 10 0 10 1 1 10 −4 10 −3 10 0 10 1 1 10 −4 10 −3 fb-A-oneyear 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg 10 0 10 1 10 −4 10 −3 10 −2 10 −1 10 0 10 0 10 1 10 −4 10 −3 10 −2 10 −1 10 0 soc-LiveJournal1 ca 10 0 10 0 10 0 10 0 Minimum conductance for any community neighborhood of the given size We are missing a region of the NCP when we just look at neighborhoods KDD2012 David Gleich · Purdue Community Size" (Degree + 1)
  • 22. Personalized PageRank Communities [Andersen06] To find the canonical NCP structure, Leskovec et al. used a personalized PageRank based community finder. These start with a single vertex seed, and then expand the community based on the solution of a personalized PageRank problem. The resulting community satisfies a local Cheeger inequality. This needs to run thousands of times for an NCP KDD2012 David Gleich · Purdue
  • 23. 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg Filling in the ! Network Community Profile la25 [27]) or as 304 [10]). The the nodes. and the edges ]. OD ure to compute he conductance ost of the work erformed when We can express 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 max deg 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 max deg 10 0 10 1 1 10 −4 10 −3 10 0 10 1 1 10 −4 10 −3 fb-A-oneyear 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg 10 0 10 1 10 −4 10 −3 10 −2 10 −1 10 0 10 0 10 1 10 −4 10 −3 10 −2 10 −1 10 0 soc-LiveJournal1 ca 10 0 10 0 10 0 10 0 Minimum conductance for any community of the given size 7807 seconds This region fills when using the PPR method (like now!) KDD2012 David Gleich · Purdue Community Size"
  • 24. Vertex Neighborhoods, ! Low Conductance Cuts, ! and Good Seeds for Local Community Methods KDD2012 David Gleich · Purdue
  • 25. Am I a good seed?! Locally Minimal Communities “My conductance is the best locally.” (N(v))  (N(w)) for all w adjacent to v In Zachary’s Karate Club network, there are four locally minimal communities, the two leaders and two peripheral nodes. KDD2012 David Gleich · Purdue
  • 26. 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg Locally minimal communities capture extremal neighborhoods la25 [27]) or as 304 [10]). The the nodes. and the edges ]. OD ure to compute he conductance ost of the work erformed when We can express 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 max deg 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 max deg 10 0 10 1 1 10 −4 10 −3 10 0 10 1 1 10 −4 10 −3 fb-A-oneyear 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg 10 0 10 1 10 −4 10 −3 10 −2 10 −1 10 0 10 0 10 1 10 −4 10 −3 10 −2 10 −1 10 0 soc-LiveJournal1 ca 10 0 10 0 10 0 10 0 Red dots are conductance " and size of a " locally minimal community Usually about 1% of # of vertices. The red circles – the best local mins – find the extremes in the egonet profile. KDD2012 David Gleich · Purdue Community Size"
  • 27. 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg Filling in the NCP! Growing locally minimal comm. la25 [27]) or as 304 [10]). The the nodes. and the edges ]. OD ure to compute he conductance ost of the work erformed when We can express 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 max deg 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 max deg 10 0 10 1 1 10 −4 10 −3 10 0 10 1 1 10 −4 10 −3 fb-A-oneyear 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg 10 0 10 1 10 −4 10 −3 10 −2 10 −1 10 0 10 0 10 1 10 −4 10 −3 10 −2 10 −1 10 0 soc-LiveJournal1 ca 10 0 10 0 10 0 10 0 Growing only locally minimal communities 283 seconds vs. 7807 seconds Full NCP Locally min NCP Original Egonet KDD2012 David Gleich · Purdue Community Size"
  • 28. 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg ver t s 2 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg ver t s 2 Filling in the NCP! Growing locally minimal comm. Growing only locally minimal communities 143 seconds vs. 2211 seconds Full NCP Locally min NCP Original Egonets arXiv – 86k verts, 500k edges KDD2012 David Gleich · Purdue Community Size"
  • 29. Recap A theorem relating clustering," heavy-tailed degrees, and" low-conductance cuts of " vertex neighborhoods. Empirical evaluation of " vertex neighborhoods. More on k-cores in the paper. ⇒  Many communities are easy to find! ⇒  Explains success of community detection? Acknowledgements! David supported by NSF CAREER award 1149756-CCF. Sesh supported by the Sandia LDRD program (project 158477) and the applied mathematics program at the Dept. of Energy. KDD2012 David Gleich · Purdue Code and results available online www.cs.purdue.edu/~dgleich/ codes/neighborhoods
  • 30. Two words on computing Can be done by just counting the triangles at each node. Linear complexity in |E| in a power- law graph. It’s possible to do this in MapReduce too. KDD2012 David Gleich · Purdue
  • 31. 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg ver t s 2 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 max deg ver t s 2 Filling in the NCP! Growing k cores Growing only locally minimal communities and k-cores 143 seconds vs. 2211 seconds Full NCP Locally min NCP Original Egonets arXiv – 86k verts, 500k edges KDD2012 David Gleich · Purdue Community Size" PPR grown k-cores k-cores
  • 32. Clustering coefficients Wedge Global clustering coefficient Local clustering coefficient  = number of closed wedges number of wedges Cv = number of closed wedges centered at v number of wedges centered at v center of wedge closed wedge Probability that a random wedge is closed KDD2012 David Gleich · Purdue