Contextualized versus Structural Overlapping Communities in Social Media.
1. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
1
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
This slide deck is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Contextualized versus Structural
Overlapping Communities in Social
Media
Mohsen Shahriari, Sabrina Haefele, Ralf Klamma
Advanced Community Information Systems (ACIS)
RWTH Aachen University, Germany
{shahriari, haefele, klamma}@dbis.rwth-aachen.de
Chair of Computer Science 5
RWTH Aachen University
2. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
2
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Outline
Research background
– Necessity of community analysis
– Community detection
Literature & Challenges
Research questions
Baselines & Proposed Methods
Dataset & Metrics
Results
Conclusion & Future Works
3. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
3
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Research Background: How to
Characterize Networks
Power law
– Eligible for social network analysis
– Presence of hubs
Small-World-ness
Motifs
– Synchronizability, cooperativity, stability and robustness may depend on motif
structures
Community structure
– Overlapping community structure
– But also to support other applications
– Scale up information
4. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
4
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Research Background: How to
Characterize Networks
Power law
– Eligible for social network analysis
– Presence of hubs
Small-World-ness
Motifs
– Synchronizability, cooperativity, stability and robustness may depend on motif
structures
Community structure
– Overlapping community structure
– But also to support other applications
– Scale up information
Degree Distribution of the CiteULike user-tag
network
Source: Taken from
networkscience.wordpress.com
5. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
5
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Research Background: How to
Characterize Networks
Power law
– Eligible for social network analysis
– Presence of hubs
Small-World-ness
Motifs
– Synchronizability, cooperativity, stability and robustness may depend on motif
structures
Community structure
– Overlapping community structure
– But also to support other applications
– Scale up information
Source: Milgram experiment “The small world problem”
6. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
6
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Research Background: How to
Characterize Networks
Power law
– Eligible for social network analysis
– Presence of hubs
Small-World-ness
Motifs
– Synchronizability, cooperativity, stability and robustness may depend on motif
structures
Community structure
– Overlapping community structure
– But also to support other applications
– Scale up information
Source: Taken from
networkscience.wordpress.com
7. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
7
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Research Background: How to
Characterize Networks
Power law
– Eligible for social network analysis
– Presence of hubs
Small-World-ness
Motifs
– Synchronizability, cooperativity, stability and robustness may depend on motif
structures
Community structure
– Overlapping community structure
– But also to support other applications
– Scale up information
8. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
8
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Research Background: What Is A
(overlapping) Community?
Components have high density inside communities
and sparse among clusters
People with similar interests
or needs (Preece, 2000)
Recent research: Overlapping
Structures are dense (Jaewon Yang & Leskovec, 2012)
(Girvan & Newman, Mark E. J., 2002)
9. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
9
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Research Background: What Is A
(overlapping) Community?
In some networks even other definitions
Signed social networks: density and balancing theory
(Doreian, 2004)
Different interpretation of communities and their
definitions
-
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
10. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
10
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Research Background: What is A
(overlapping) Community?
Communities may be formed when people have
some ideas, innovation and thoughts to discuss
– When they do not know each other
11. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
11
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
LiteratureLiterature
12. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
12
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Challenges regarding Content-based
OCD
Imperceptible knowledge regarding significance of content
– Community events e.g., releases in open source developer network
– Correlation of content and structural properties of the social media
Few of them detect overlapping community structures
– Detecting only disjoint community structures
Most of the methods are not suitable for thread-based data
structures
– Needs huge tuning
Most of the approaches do not work on actual posts/contents
– Use mainly attributes/tags
13. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
13
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Research Questions
How structural properties like number of overlapping
nodes, modularity and average community size are
affected by contextualized similarities among users in
question & answer social platforms?
Can adding of content improve the performance of
structural based algorithms?
14. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
14
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Structural/Content-Based OCD
Approaches
First we introduce the baselines used in this work
– Disassortative degree Mixing and Information Diffusion (DMID)
– Speaker-listener Label Propagation Algorithm (SLPA)
– Stanoev, Smikov and Kocarev (SSK)
– Algorithm by Li, Zhang, Liu, Chen and Zhang (CLIZZ)
Then we introduce the proposed Content-based
methods
– Cost function optimization clustering algorithm (CFOCA)
– Term community merging algorithm (TCMA)
– Combining content and structural values
15. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
15
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Baseline Methods: Disassortative Degree
Mixing and Information Diffusion (DMID)
Detecting most influential nodes (leaders)
– Using of disassortative degree mixing property
– 𝐴𝑆𝑖𝑗 = deg 𝑖 − deg(𝑗)
– Row normalize disassortative matrix
– 𝑇𝑖𝑗 =
𝐴𝑆𝑖𝑗
𝑘=1
𝑁
𝐴𝑆𝑖𝑘
– Performing a random walk
– 𝐷𝐴𝑡+1
= 𝐷𝐴 𝑡
× 𝑇
– Computing local leadership value
– Combining degree and disassortative value
– 𝐿𝐿𝑖 = 𝐷𝐴𝑖 × 𝐷𝑅𝑁𝑖
Cascading behavior named network coordination game
𝑃𝐴 𝑖 =
{𝑗∈𝑁 𝑖 :𝑗 ℎ𝑎𝑠 𝑏𝑒ℎ𝑎𝑣𝑖𝑜𝑢𝑟 𝐴}
𝑁(𝑖)
16. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
16
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Baseline Methods: Speaker-listener
Label Propagation Algorithm (SLPA)
Extension of label propagation algorithm
– Nodes can take multiple labels
Idea: speaker-listener information propagation process (mimics human
communication)
Nodes can store updated labels
Steps:
1. Node’s memory is initialized with unique label
2. Do until a user defined iteration number is reached:
1. Select one node as listener
2. Each neighbor randomly selects a label
3. Listener accepts one of the propagated labels according to a rule (e.g.,
most popular label)
3. Post-processing phase for identifying the communities
17. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
17
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Baseline Methods: Stanoev, Smikov
and Kocarev (SSK)
An algorithm based on influence dynamics and membership
computation
– Relationships of nodes and their influences are more important than direct
connections
– Proxies among nodes are better established while there exits triangles among
nodes
Computing transitive link matrix using both adjacency matrix and
triangle occurrences
Computing the membership of nodes to leaders
– Weighted average membership of neighbors
18. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
18
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Baseline Methods: CLIZZ
Two phase algorithm
– Identifying influential nodes based on influence range
– Influence ranges are computed based on shortest
distance
– Computing membership values of nodes using and
updating rule
19. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
19
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Proposed Content-Based Methods:
Feature Creation Phase
Term Matrix
– Constructed from threads of the user
– Converted by tf-idf
Threads
tf-idf
Threads
Threads
w1 w2 w3 …
0.23 0.5 0
0.8 0 1
0 1.2 0.59
w1
w3
w2Term Matrix
20. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
20
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Minimization of the costs
Cost function J based on cosine similarity
Updating the centroids using gradient descent
Modification for overlapping communities: threshold
for distance to other centroids
Cost Function Optimization
Clustering Algorithm (CFOCA)
21. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
21
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Term Community Merging
Algorithm (TCMA)
Two phases
– Compute one community per each word
– Refinement of the communities using overlapping
coefficient
w1 w2 w3 …
0.23 0.5 0
0.8 0.76 1
0 1.2 0.59
Term Matrix
22. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
22
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Content-Based Weighting Method
Generate two weights from content
Use OCD algorithms to compute communities, like
DMID, SSK and CLiZZ
Threads
( r , s )
w1 w2 w3 …
0.23 0.5 0 …
0.8 0 1 …
Term Matrix
23. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
23
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Datasets and Metrics
Jmol dataset
– Forum discussion regarding a Java-Tool for molecular modeling of
chemical structures
– Open source development
– 2002 – 2012
– Publicly available at
– https://github.com/rwth-acis/REST-OCDServices/wiki/Jmol-Dataset
Combined modularity
– Considering both
content and density
Number of overlapping nodes, average community sizes to
extract useful information
24. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
24
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Similarity Costs versus Average
Community Size
1, 10 and 11 have low content similarity
6 has the highest content similarity
Community has the highest size
25. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
25
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Similarity Costs versus Number of
Overlapping Nodes
Releases 2, 3, 4 and 5 have high similarity and low
overlapping nodes
Similarity costs are global measures
26. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
26
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Similarity Costs versus Modularity
Reverse relation between content similarity and modularity
27. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
27
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Average Community Size versus
Releases
Content-based algorithms are useful when structure of the
network is missing
Content-based algorithms detect bigger community sizes
28. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
28
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Number of Overlapping Nodes versus
Releases
Content-based methods may reflect the actual changes
Content-based methods detect higher overlaps in
comparison to structural-based methods
29. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
29
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Conclusion & Future Works
Conclusion & Message:
Content has significant effect on structural-based techniques
– Changing in community sizes, number of overlapping nodes and modularity
– Content-based methods detect bigger community sizes with bigger overlaps
Future Works:
Investigate local similarity costs
Improving time complexity
30. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
30
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
References
Ahn, Y.-Y., Bagrow, J. P., & Lehmann, S. (2010). Link communities reveal multiscale complexity in networks, Nature, 466(7307),
761–764. doi:10.1038/nature09182
Derényi, I., Palla, G., & Vicsek, T. (2005). Clique Percolation in Random Networks. Physical Review Letters, 94(16), 160202.
doi:10.1103/PhysRevLett.94.160202
Ding, Z., Zhang, X., Sun, D., & Luo, B. (2016). Overlapping Community Detection based on Network Decomposition. Sci Rep,
6(24115). doi:10.1038/srep24115
Doreian, P. (2004). Evolution of Human Signed Networks, 1(2), 277–293. Retrieved from http://snap.stanford.edu/class/cs224w-
readings/dorean04evolution.pdf
Girvan, M., & Newman, Mark E. J. (2002). Community structure in social and biological networks. Proceedings of the National
Academy of Sciences, 99(12), 7821–7826. doi:10.1073/pnas.122653799
Gunnemann, S., Boden, B., Farber, I., & Seidl, T. (2013). Efficient Mining of Combined Subspace and Subgraph Clusters in Graphs
with Feature Vectors. In Advances in Knowledge Discovery and Data Mining (pp. 261–275). Springer Berlin Heidelberg.
Gunnemann, S., Farber, I., Boden, B., & Seidl, T. (2010). subspace clustering meets dense subgraph mining; a synthesis of two
paradigms. In The 10th International Conference On Data Mining .
Havemann, F., Heinz, M., Struck, A., & Gläser, J. (2011). Identification of overlapping communities and their hierarchy by locally
calculating community-changing resolution levels. Journal of Statistical Mechanics: Theory and Experiment. doi:10.1088/1742-
5468/2011/01/P01023
Preece, J. (2002). Supporting Community and Building Social Capital - Guest Editorial. Communications of the ACM, 45(4), 37 ‐ 39.
Shahriari, M., Parekodi, S., & Klamma, R. (2015). Community-aware Ranking Algorithms for Expert Identification in Question-
answer Forums. In Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business. I-
KNOW (pp. 1–8). ACM. Retrieved from http://doi.acm.org/10.1145/2809563.2809592
Shen, H., Cheng, X., Cai, K., & Hu, M.-B. (2009). Detect overlapping and hierarchical community structure in networks. PHYSICA A-
STATISTICAL MECHANICS AND ITS APPLICATIONS, 388(8), 1706–1712. doi:10.1016/j.physa.2008.12.021
Yang, J., & Leskovec, J. (2012). Structure and Overlaps of Communities in Networks. CoRR, abs/1205.6228.
31. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
31
Learning
Layers
Contextualized
versus
Structural
Overlapping
Community
Structures in
Social Media
Mohsen Shahriari
Ying Li
Ralf Klamma
Notes de l'éditeur
Power law indicates if the network is scale free, presence of hubs
Motifs
http://mathinsight.org/image/three_node_motifs
Power law indicates if the network is scale free, presence of hubs
Motifs
http://mathinsight.org/image/three_node_motifs
Power law indicates if the network is scale free, presence of hubs
Motifs
http://mathinsight.org/image/three_node_motifs
Power law indicates if the network is scale free, presence of hubs
Motifs
http://mathinsight.org/image/three_node_motifs
Power law indicates if the network is scale free, presence of hubs
Motifs
http://mathinsight.org/image/three_node_motifs
Cite the paper Community-Affiliation Graph Model for Overlapping Network Community Detection