2. Online Social Networks
8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware
•Vertices •Edges •Metadata
Ioanna Antonio Vaidas
Aras
Vasia
Anis
Mudit
Manos
2
LeandroJohan
3. Existing Solutions
• Relational Databases
- MySQL Cluster
• Key-Value stores
- Cassandra, Amazon Dynamo
• Document Databases
- MongoDB, CouchDB
• Graph Databases
- Neo4j, Titans
8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 3
4. Why Existing Solutions are not
enough?
8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware
5
3
4
2
1
10
8
9
7
6
4
5. Why Existing Solutions are not
enough?
• Random Partitioning
• Social Request
- E.g., gather new feeds
from all the friends
• Enforcing Data
Locality
• Random partitioning
can lead to full
replication!
8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware
5
3
4
2
1
10
8
9
7
6
1 4 7 82 3 5 6 10 9
1’ 4’ 7’ 8’ 9’ 2’ 3’ 6’5’ 10’
5
6. Social Graphs are not Random
8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 6
16. Evaluation- with datasets
8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware
0
2
4
6
8
10
12
Random
SPAR
JA-BE-JA
Gossip-based
ReplicationOverhead
>3x gain
compared to
Random
Partitioning
≈2x gain
compared to
SPAR
• Number of Servers =16, Replication factor=2
16
17. Evaluation- with replication factor
8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware
ReplicationOverhead
• Number of Servers =16
0
1
2
3
4
5
6
7
8
9
10
f=0
f=2
Random Graphs
generates maximum
replication overhead Real Graphs
generates minimum
replication
overhead
Data locality is
achieved by fault
tolerance replicas
17
18. Evaluation- with servers
8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware
ReplicationOverhead
• Replication factor =2
Number of Servers
WSON-Facebook
18
0
2
4
6
8
10
12
14
16
18
20
8 16 32 64
Random
SPAR
JA-BE-JA
Gossip-based
Gossip-based
generates minimum
replication
overhead
Replication
overhead
increases non
linearly
>4x gain
compared to
Random
Partitioning
0
2
4
6
8
10
12
14
16
18
20
8 16 32 64
Gossip-based
19. Evaluation- dynamicity
8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware
• Number of Servers =16, Replication factor=2
0.2
0.25
0.3
0.35
0.4
0.45
0
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
0.2
0.25
0.3
0.35
0.4
0.45
0
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
SNAP-Twitter SNAP-Facebook
Number of cycles Number of cycles
ReplicationOverhead
ReplicationOverhead
Spikes show
bulk edge
addition
Algorithm
Stabilization
19
Transition state,
i.e., reducing the
number of replicas
after new edge
additions
20. Conclusion
• Random Partitioning does not provide efficient
solution of Online Social Networks
• Minimizing Replicas can help to achieve better
partitioning
• Gossip-based heuristic was proposed to solve the
minimization problem while achieving the global
optima
• Algorithm able to handle different datasets and
adjusts with dynamic nature of OSNs
8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 20
22. Future Work
• Execution of the algorithm with large datasets using
parallel graph processing frameworks like
GraphLab and Apache Girpah
• Load Balancing using both Master and Replicas and
providing different consistency levels
• Smart Replication to provide data locality for highly
interactive nodes
• Implement different consistency strategies based to
access patterns
8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 22