SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
Community Detection in Social
        Networks
           Lei Tang
Properties of Complex Network



  Power Law




                   Community Structure




   Small World
Why Community Detection?
Communities in a citation network might represent
related papers on a single topic;
Communities on the web might represent pages of
related topics;
Community can be considered as a summary of
the whole network thus easy to visualize and
understand.
Sometimes, community can reveal the properties
without releasing the individual privacy information.
Community Detection,
Reinventing the wheel?
Community Detection = Clustering?

As I understand, community detection is
essentially clustering.
But why so many works on Community
Detection? (in physical review, KDD, WWW)
The network data pose challenges to
classical clustering method.
Difference
Clustering works on the distance or similarity matrix (k-
means, hierarchical clustering, spectral clustering)
Network data tends to be “discrete”, leading to algorithms
using the graph property directly (k-clique, quasi-clique,
vertex-betweenness, edge-betweeness etc.)
Real-world network is large scale! Sometimes, even n^2 in
unbearable for efficiency or space (local/distributed
clustering, network approximation, sampling method)
Outline
Two recent community detection methods
Clustering based on shortest-path
betweenness
Clustering based on network modularity
Basic Idea
 A simple divisive strategy:
 Repeat
1. Find out one “inter-community” edge
2. Remove the edge
3. Check if there’s any disconnected components (which
    corresponds to a community)
How to measure “inter-community”
If two communities are joined by a few inter-community
edges, then all the paths from one community to another
must pass the edges.
Various measures:
Edge Betweenness: find the shortest paths
between all pairs of nodes and count how
many run along each edge.
Random Walk betweenness.
Current-flow betweenness
Shortest-path betweenness
Computation could be expensive: calculating the shortest
path between one pair is O(m), and there are O(n^2) pairs.
Could be optimized to O(mn)
Simple case: only one shortest path
                    When there is only one single path between the
                    Source S and other vertex, then those paths form
                    a tree.

                    Bottom-up: start from the leaves, assign edges to 1.

                    Count of parent edge = sum (count of children edge)+1
Multiple shortest path
First compute the number of paths from source to other
vertex
Then assign a proper weight for the path counts
 sum of the betweenness =.number of reachable
vertices.
Calculate #shortest path
1.Initial distance




                         W:Number of shortest paths
Update edge weight




               Edge weight
Time Complexity
O(mn) in each iteration.
Could be accelerated by noting that only the
nodes in the connected component would be
affected.
Some other techniques developed: sampling
strategy to approximate the betweenness; use
specific network index for speed.
Obtain a hierarchical tree, use modularity
To determine the number of clusters.
Modularity
Spectral clustering essentially tries to minimize the number
edges between groups.
Modularity consider the number edges which is smaller than
expected.




If the difference is significantly large, there’s a community
structure inside.
The larger, the better.
Quiz
Given a network of m edges, for two nodes
with degree ki, kj, what is the expected
edges between these two nodes?
Modularity Calculation




Modularity can be used to determine the
number of clusters, why not maximize it
directly?
Unfortunately, it’s NP-hard
Relaxation


             Eigen Value
              Problem!



                                            Modularity
                                             Matrix


Betai is the eigen value of the
Eigen vector ui of modularity matrix B
Properties of Modularity Matrix


(1,1,…1) is an eigen vector with zero eigen value.
Different from graph Laplacian, the eigen value of
modularity matrix could be +, 0 or -.
What if the maximum eigen value is zero?
Essentially, it hints that there’s no strong
community pattern. Not necessary to split the
network, which is a nice property.
Here, the spectral partitioning is forced to
split the network into approximately equal-
size clusters.
Extensions
Divisive clustering
K - partitioning…
Comments
I thought spectral clustering is the end of clustering. But
here a new measure Modularity is proposed and found to
be working very well, which confirms that “research is
endless”, or “no last bug”.
Since Graph Laplacian and Modularity matrix both boils
down to a eigen value problem, is there any innate
connection between these two measures?
How could it work if we apply it directly to some classic
data representation?
Extend modularity to relational data could be a promising
direction.
There could be more opportunities than “wheels” in social
computing.
Scalability is really a big issue.
References
M.E.J.Newman, Finding community structure in
networks using the eigenvectors of matrices, Phys.
Rev. , 2006
M. E. J. Newman, M. Girvan, Finding and
evaluating community structure in networks,
Phys. Rev. 2004

Contenu connexe

Tendances

Recomendation system: Community Detection Based Recomendation System using Hy...
Recomendation system: Community Detection Based Recomendation System using Hy...Recomendation system: Community Detection Based Recomendation System using Hy...
Recomendation system: Community Detection Based Recomendation System using Hy...Rajul Kukreja
 
Network sampling, community detection
Network sampling, community detectionNetwork sampling, community detection
Network sampling, community detectionroberval mariano
 
Social network analysis basics
Social network analysis basicsSocial network analysis basics
Social network analysis basicsPradeep Kumar
 
Community detection in graphs
Community detection in graphsCommunity detection in graphs
Community detection in graphsNicola Barbieri
 
Action and content based Community Detection in Social Networks
Action and content based Community Detection in Social NetworksAction and content based Community Detection in Social Networks
Action and content based Community Detection in Social Networksritesh_11
 
Exploratory social network analysis with pajek
Exploratory social network analysis with pajekExploratory social network analysis with pajek
Exploratory social network analysis with pajekTHomas Plotkowiak
 
Detecting Community Structures in Social Networks by Graph Sparsification
Detecting Community Structures in Social Networks by Graph SparsificationDetecting Community Structures in Social Networks by Graph Sparsification
Detecting Community Structures in Social Networks by Graph SparsificationSatyaki Sikdar
 
16 zaman nips10_workshop_v2
16 zaman nips10_workshop_v216 zaman nips10_workshop_v2
16 zaman nips10_workshop_v2talktoharry
 
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...Alexander Pozdneev
 
Taxonomy and survey of community
Taxonomy and survey of communityTaxonomy and survey of community
Taxonomy and survey of communityIJCSES Journal
 
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS csandit
 
Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities inmoresmile
 
Overlapping community detection survey
Overlapping community detection surveyOverlapping community detection survey
Overlapping community detection survey煜林 车
 
Community Detection
Community DetectionCommunity Detection
Community DetectionIlio Catallo
 
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK csandit
 
IRJET - Exploring Agglomerative Spectral Clustering Technique Employed for...
IRJET - 	  Exploring Agglomerative Spectral Clustering Technique Employed for...IRJET - 	  Exploring Agglomerative Spectral Clustering Technique Employed for...
IRJET - Exploring Agglomerative Spectral Clustering Technique Employed for...IRJET Journal
 
Formulation of modularity factor for community detection applying
Formulation of modularity factor for community detection applyingFormulation of modularity factor for community detection applying
Formulation of modularity factor for community detection applyingIAEME Publication
 

Tendances (20)

Recomendation system: Community Detection Based Recomendation System using Hy...
Recomendation system: Community Detection Based Recomendation System using Hy...Recomendation system: Community Detection Based Recomendation System using Hy...
Recomendation system: Community Detection Based Recomendation System using Hy...
 
Network sampling, community detection
Network sampling, community detectionNetwork sampling, community detection
Network sampling, community detection
 
Social network analysis basics
Social network analysis basicsSocial network analysis basics
Social network analysis basics
 
Community detection in graphs
Community detection in graphsCommunity detection in graphs
Community detection in graphs
 
Action and content based Community Detection in Social Networks
Action and content based Community Detection in Social NetworksAction and content based Community Detection in Social Networks
Action and content based Community Detection in Social Networks
 
Community detection
Community detectionCommunity detection
Community detection
 
Exploratory social network analysis with pajek
Exploratory social network analysis with pajekExploratory social network analysis with pajek
Exploratory social network analysis with pajek
 
17 Statistical Models for Networks
17 Statistical Models for Networks17 Statistical Models for Networks
17 Statistical Models for Networks
 
Detecting Community Structures in Social Networks by Graph Sparsification
Detecting Community Structures in Social Networks by Graph SparsificationDetecting Community Structures in Social Networks by Graph Sparsification
Detecting Community Structures in Social Networks by Graph Sparsification
 
16 zaman nips10_workshop_v2
16 zaman nips10_workshop_v216 zaman nips10_workshop_v2
16 zaman nips10_workshop_v2
 
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
 
Taxonomy and survey of community
Taxonomy and survey of communityTaxonomy and survey of community
Taxonomy and survey of community
 
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS
 
Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities in
 
Overlapping community detection survey
Overlapping community detection surveyOverlapping community detection survey
Overlapping community detection survey
 
Community Detection
Community DetectionCommunity Detection
Community Detection
 
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK
 
IRJET - Exploring Agglomerative Spectral Clustering Technique Employed for...
IRJET - 	  Exploring Agglomerative Spectral Clustering Technique Employed for...IRJET - 	  Exploring Agglomerative Spectral Clustering Technique Employed for...
IRJET - Exploring Agglomerative Spectral Clustering Technique Employed for...
 
Formulation of modularity factor for community detection applying
Formulation of modularity factor for community detection applyingFormulation of modularity factor for community detection applying
Formulation of modularity factor for community detection applying
 
3 Centrality
3 Centrality3 Centrality
3 Centrality
 

En vedette

Community detection in social networks
Community detection in social networksCommunity detection in social networks
Community detection in social networksFrancisco Restivo
 
Group and Community Detection in Social Networks
Group and Community Detection in Social NetworksGroup and Community Detection in Social Networks
Group and Community Detection in Social NetworksKent State University
 
Decentralized social network
Decentralized social networkDecentralized social network
Decentralized social networkRichard Philip
 
Community Detection in Brain Networks
Community Detection in Brain NetworksCommunity Detection in Brain Networks
Community Detection in Brain NetworksManas Gaur
 
This isn't what I thought it was: community in the network age
This isn't what I thought it was: community in the network ageThis isn't what I thought it was: community in the network age
This isn't what I thought it was: community in the network ageNancy Wright White
 
The Network, the Community and the Self-Creativity
The Network, the Community and the Self-CreativityThe Network, the Community and the Self-Creativity
The Network, the Community and the Self-CreativityVince Cammarata
 
Analysis of the Evolution of Events on Online Social Networks
Analysis of the Evolution of Events on Online Social NetworksAnalysis of the Evolution of Events on Online Social Networks
Analysis of the Evolution of Events on Online Social NetworksMiguel Rebollo
 
Network centrality measures and their effectiveness
Network centrality measures and their effectivenessNetwork centrality measures and their effectiveness
Network centrality measures and their effectivenessemapesce
 
P k-nag-solution 2
P k-nag-solution 2P k-nag-solution 2
P k-nag-solution 2Nitesh Singh
 
Social media mining PPT
Social media mining PPTSocial media mining PPT
Social media mining PPTChhavi Mathur
 
Network ppt
Network pptNetwork ppt
Network ppthlalu861
 

En vedette (14)

Community detection in social networks
Community detection in social networksCommunity detection in social networks
Community detection in social networks
 
Group and Community Detection in Social Networks
Group and Community Detection in Social NetworksGroup and Community Detection in Social Networks
Group and Community Detection in Social Networks
 
Decentralized social network
Decentralized social networkDecentralized social network
Decentralized social network
 
Community Detection in Brain Networks
Community Detection in Brain NetworksCommunity Detection in Brain Networks
Community Detection in Brain Networks
 
This isn't what I thought it was: community in the network age
This isn't what I thought it was: community in the network ageThis isn't what I thought it was: community in the network age
This isn't what I thought it was: community in the network age
 
The Network, the Community and the Self-Creativity
The Network, the Community and the Self-CreativityThe Network, the Community and the Self-Creativity
The Network, the Community and the Self-Creativity
 
Analysis of the Evolution of Events on Online Social Networks
Analysis of the Evolution of Events on Online Social NetworksAnalysis of the Evolution of Events on Online Social Networks
Analysis of the Evolution of Events on Online Social Networks
 
Network centrality measures and their effectiveness
Network centrality measures and their effectivenessNetwork centrality measures and their effectiveness
Network centrality measures and their effectiveness
 
P k-nag-solution 2
P k-nag-solution 2P k-nag-solution 2
P k-nag-solution 2
 
Social media mining PPT
Social media mining PPTSocial media mining PPT
Social media mining PPT
 
Network ppt
Network pptNetwork ppt
Network ppt
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
استاذ
استاذاستاذ
استاذ
 
Networking ppt
Networking ppt Networking ppt
Networking ppt
 

Similaire à Community detection in social networks[1]

Jürgens diata12-communities
Jürgens diata12-communitiesJürgens diata12-communities
Jürgens diata12-communitiesPascal Juergens
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...Daniel Katz
 
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors ijbbjournal
 
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors ijbbjournal
 
Higher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoIHigher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoIAustin Benson
 
Distribution of maximal clique size of the
Distribution of maximal clique size of theDistribution of maximal clique size of the
Distribution of maximal clique size of theIJCNCJournal
 
Scott Complex Networks
Scott Complex NetworksScott Complex Networks
Scott Complex Networksjilung hsieh
 
Lecture 5 - Qunatifying a Network.pdf
Lecture 5 - Qunatifying a Network.pdfLecture 5 - Qunatifying a Network.pdf
Lecture 5 - Qunatifying a Network.pdfclararoumany1
 
Community structure in social and biological structures
Community structure in social and biological structuresCommunity structure in social and biological structures
Community structure in social and biological structuresMaxim Boiko Savenko
 
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...csandit
 
Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)Tin180 VietNam
 
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKSSCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKSIJDKP
 
Scalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large NetworksScalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large NetworksIJDKP
 
CS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit VCS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit Vpkaviya
 
network mining and representation learning
network mining and representation learningnetwork mining and representation learning
network mining and representation learningsun peiyuan
 
DISTRIBUTION OF MAXIMAL CLIQUE SIZE UNDER THE WATTS-STROGATZ MODEL OF EVOLUTI...
DISTRIBUTION OF MAXIMAL CLIQUE SIZE UNDER THE WATTS-STROGATZ MODEL OF EVOLUTI...DISTRIBUTION OF MAXIMAL CLIQUE SIZE UNDER THE WATTS-STROGATZ MODEL OF EVOLUTI...
DISTRIBUTION OF MAXIMAL CLIQUE SIZE UNDER THE WATTS-STROGATZ MODEL OF EVOLUTI...ijfcstjournal
 
Distribution of maximal clique size under
Distribution of maximal clique size underDistribution of maximal clique size under
Distribution of maximal clique size underijfcstjournal
 
Wanted: a larger, different kind of box
Wanted: a larger, different kind of boxWanted: a larger, different kind of box
Wanted: a larger, different kind of boxLina Martinsson Achi
 

Similaire à Community detection in social networks[1] (20)

Jürgens diata12-communities
Jürgens diata12-communitiesJürgens diata12-communities
Jürgens diata12-communities
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...
 
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors
 
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors
 
Higher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoIHigher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoI
 
Distribution of maximal clique size of the
Distribution of maximal clique size of theDistribution of maximal clique size of the
Distribution of maximal clique size of the
 
Scott Complex Networks
Scott Complex NetworksScott Complex Networks
Scott Complex Networks
 
Lecture 5 - Qunatifying a Network.pdf
Lecture 5 - Qunatifying a Network.pdfLecture 5 - Qunatifying a Network.pdf
Lecture 5 - Qunatifying a Network.pdf
 
Document 8 1.pdf
Document 8 1.pdfDocument 8 1.pdf
Document 8 1.pdf
 
Community structure in social and biological structures
Community structure in social and biological structuresCommunity structure in social and biological structures
Community structure in social and biological structures
 
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...
 
Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)
 
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKSSCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
 
Scalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large NetworksScalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large Networks
 
CS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit VCS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit V
 
F0963440
F0963440F0963440
F0963440
 
network mining and representation learning
network mining and representation learningnetwork mining and representation learning
network mining and representation learning
 
DISTRIBUTION OF MAXIMAL CLIQUE SIZE UNDER THE WATTS-STROGATZ MODEL OF EVOLUTI...
DISTRIBUTION OF MAXIMAL CLIQUE SIZE UNDER THE WATTS-STROGATZ MODEL OF EVOLUTI...DISTRIBUTION OF MAXIMAL CLIQUE SIZE UNDER THE WATTS-STROGATZ MODEL OF EVOLUTI...
DISTRIBUTION OF MAXIMAL CLIQUE SIZE UNDER THE WATTS-STROGATZ MODEL OF EVOLUTI...
 
Distribution of maximal clique size under
Distribution of maximal clique size underDistribution of maximal clique size under
Distribution of maximal clique size under
 
Wanted: a larger, different kind of box
Wanted: a larger, different kind of boxWanted: a larger, different kind of box
Wanted: a larger, different kind of box
 

Community detection in social networks[1]

  • 1. Community Detection in Social Networks Lei Tang
  • 2. Properties of Complex Network Power Law Community Structure Small World
  • 3. Why Community Detection? Communities in a citation network might represent related papers on a single topic; Communities on the web might represent pages of related topics; Community can be considered as a summary of the whole network thus easy to visualize and understand. Sometimes, community can reveal the properties without releasing the individual privacy information.
  • 5. Community Detection = Clustering? As I understand, community detection is essentially clustering. But why so many works on Community Detection? (in physical review, KDD, WWW) The network data pose challenges to classical clustering method.
  • 6. Difference Clustering works on the distance or similarity matrix (k- means, hierarchical clustering, spectral clustering) Network data tends to be “discrete”, leading to algorithms using the graph property directly (k-clique, quasi-clique, vertex-betweenness, edge-betweeness etc.) Real-world network is large scale! Sometimes, even n^2 in unbearable for efficiency or space (local/distributed clustering, network approximation, sampling method)
  • 7. Outline Two recent community detection methods Clustering based on shortest-path betweenness Clustering based on network modularity
  • 8. Basic Idea A simple divisive strategy: Repeat 1. Find out one “inter-community” edge 2. Remove the edge 3. Check if there’s any disconnected components (which corresponds to a community)
  • 9. How to measure “inter-community” If two communities are joined by a few inter-community edges, then all the paths from one community to another must pass the edges. Various measures: Edge Betweenness: find the shortest paths between all pairs of nodes and count how many run along each edge. Random Walk betweenness. Current-flow betweenness
  • 10. Shortest-path betweenness Computation could be expensive: calculating the shortest path between one pair is O(m), and there are O(n^2) pairs. Could be optimized to O(mn) Simple case: only one shortest path When there is only one single path between the Source S and other vertex, then those paths form a tree. Bottom-up: start from the leaves, assign edges to 1. Count of parent edge = sum (count of children edge)+1
  • 11. Multiple shortest path First compute the number of paths from source to other vertex Then assign a proper weight for the path counts sum of the betweenness =.number of reachable vertices.
  • 12. Calculate #shortest path 1.Initial distance W:Number of shortest paths
  • 13. Update edge weight Edge weight
  • 14. Time Complexity O(mn) in each iteration. Could be accelerated by noting that only the nodes in the connected component would be affected. Some other techniques developed: sampling strategy to approximate the betweenness; use specific network index for speed.
  • 15. Obtain a hierarchical tree, use modularity To determine the number of clusters.
  • 16. Modularity Spectral clustering essentially tries to minimize the number edges between groups. Modularity consider the number edges which is smaller than expected. If the difference is significantly large, there’s a community structure inside. The larger, the better.
  • 17. Quiz Given a network of m edges, for two nodes with degree ki, kj, what is the expected edges between these two nodes?
  • 18. Modularity Calculation Modularity can be used to determine the number of clusters, why not maximize it directly? Unfortunately, it’s NP-hard
  • 19. Relaxation Eigen Value Problem! Modularity Matrix Betai is the eigen value of the Eigen vector ui of modularity matrix B
  • 20. Properties of Modularity Matrix (1,1,…1) is an eigen vector with zero eigen value. Different from graph Laplacian, the eigen value of modularity matrix could be +, 0 or -. What if the maximum eigen value is zero? Essentially, it hints that there’s no strong community pattern. Not necessary to split the network, which is a nice property.
  • 21. Here, the spectral partitioning is forced to split the network into approximately equal- size clusters.
  • 23. Comments I thought spectral clustering is the end of clustering. But here a new measure Modularity is proposed and found to be working very well, which confirms that “research is endless”, or “no last bug”. Since Graph Laplacian and Modularity matrix both boils down to a eigen value problem, is there any innate connection between these two measures? How could it work if we apply it directly to some classic data representation? Extend modularity to relational data could be a promising direction. There could be more opportunities than “wheels” in social computing. Scalability is really a big issue.
  • 24. References M.E.J.Newman, Finding community structure in networks using the eigenvectors of matrices, Phys. Rev. , 2006 M. E. J. Newman, M. Girvan, Finding and evaluating community structure in networks, Phys. Rev. 2004