Contenu connexe Similaire à DataXDay - Exploring graphs: looking for communities & leaders (20) Plus de DataXDay Conference by Xebia (6) DataXDay - Exploring graphs: looking for communities & leaders11. @DataXDay@DataXDay
Some use cases of graph theory
Spreading
• Determine the speed of a spreading
phenomenon
• How to speed it up or to slow it down?
Viral marketing, vaccination campaigns
Dynamics & optimisation
• Shortest path between two nodes?
• Effects of modifying the structure?
Transportation systems, social networks
Domino effects
• Resilience to random failures?
• And to targeted attacks?
Security systems, economics,
infrastructures
Structural importance
• Which nodes are the most important or
authoritatives? Who are the leaders?
Google PageRank algorithm
© Quantmetry 2018 | Diffusion interdite sans accord
16. @DataXDay@DataXDay
Girvan Newman: a good algorithm on small graphs
(<500 nodes), but a very high complexity
Walktrap : much more efficient on large graphs
Two examples
Random walk on a network: path
following randomly chosen edges on the
graph
Community « strength »: proportional to the
time a random walker spends inside it
Cut the bridges: iteratively remove links
with highest betweenness
Community are found when the graph becomes
disconnected
© Quantmetry 2018 | Diffusion interdite sans accord
17. @DataXDay@DataXDay
✅ Able to identify heterogenous communities
✅ Efficient on large graphs: complexity O(N logN)
✅ Available in most graph analytical libraries: ok as first try
And the winner is... Louvain algorithm
© Quantmetry 2018 | Diffusion interdite sans accord
18. @DataXDay@DataXDay
✅ Able to identify heterogenous communities
✅ Efficient on large graphs: complexity O(N logN)
✅ Available in most graph analytical libraries: ok as first try
And the winner is... Louvain algorithm
Modularity optimization
Density of edges inside vs outside clusters
𝑄 =
1
2𝑚
& 𝐴() −
𝑘( 𝑘)
2𝑚
𝛿
()
(𝑐(, 𝑐))
Local to global greedy
From
groups of
nodes …
… to groups
of clusters
© Quantmetry 2018 | Diffusion interdite sans accord
19. @DataXDay@DataXDay
• I measure the capability to reconstruct real,
known communities
• Example of metrics: Normalized Mutual
Information
I observe the truth: the known communities
Testing the algorithms and measuring the performances
I create the truth: the Stochastic Block Model
• I define the probability for each couple of
nodes to be connected
• In the simplest case:
𝑝() = ?
𝐴 𝑖𝑓 𝑖, 𝑗 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑐𝑜𝑚𝑚𝑢𝑛𝑖𝑡𝑦
𝐵 < 𝐴 𝑖𝑓 𝑛𝑜𝑡
• More links inside communities as a
consequence
• Many observations can be generated to test
algorithms
© Quantmetry 2018 | Diffusion interdite sans accord
23. @DataXDay@DataXDay
Different ways of measuring nodes importance
A global importance : the betweenness centrality A local importance : the degree
Is the node « well connected »?
Count its number of direct neighbours
Is the node a « bridge »?
Count number of shortest paths passing through it
A well known, iterative metric : Google PageRank -> Is the node connected to many important nodes ?
© Quantmetry 2018 | Diffusion interdite sans accord
31. The video of this presentation
will be soon available at dataxday.fr
Thanks to our sponsors
Stay tuned by following @DataXDay