Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Leopard: Lightweight Partitioning and Replication
for Dynamic Graphs
Jiewen Huang and Daniel Abadi
Yale University
Facebook Social Graph
Social Graphs
Web Graphs
Semantic Graphs
Many systems use hash partitioning
● Results in many edges being “cut”
Given a graph G and an integer k, partition the ver...
Multilevel scheme Coarsening phase
State of the Art
The only constant is change.
-------- Heraclitus
To Make the Problem more Complicated
Social graphs: new people and friend...
Dynamic Graphs
A
Partition 1 Partition 2
Is partition 1 still the
better partition for A?
Repartitioning the entire graph upon every change is way too expensive
New Framework
Leopard:
● Locally reassess partition...
Outline
Background and Motivation
LEOPARD
Overview
Computation Skipping
Replication
Experiments
Algorithm Overview
For each added/deleted edge <V1, V2>
Compute best partition for V1 using a heuristic
Re-assign V1 if ne...
Example: Adding an Edge
A
B
Partition 1 Partition 2
Compute the Partition for B
A
B
Partition 1 Partition 2# neighbours: 1
# vertices: 5
# neighbours: 3
# vertices: 3
Goals: ...
Compute the Partition for A
A
B
Partition 1 Partition 2# neighbours: 1
# vertices: 4
# neighbours: 2
# vertices: 4
Goals: ...
Example: Adding an Edge
B
Partition 1 Partition 2
A
(1) B stays put
(2) A moves to partition 2
Outline
Background and Motivation
Leopard
Overview
Computation Skipping
Replication
Experiments
Computation cost
For each new edge, must:
For both vertexes involved in the edge:
Calculate the heuristic for each partiti...
Computation Skipping
Observation: As the number of neighbors of a vertex increases, the influence of a
new neighbor decrea...
Computation Skipping
Basic Idea: Accumulate changes for a vertex, if the changes exceed a certain
threshold, recompute the...
Outline
Background and Motivation
Leopard
Overview
Computation Skipping
Replication
Experiments
Goals of replication:
fault tolerance (k copies for each data
point/block)
further cut reduction
Replication
It takes two parameters:
● minimum: fault tolerance
● average: cut reduction
Minimum-Average Replication
Example
# copies vertices
2 A,C,D,E,H,J,K,L
3 F,I
4 B,G
min = 2
average = 2.5
first copy
replica
Example
# copies vertices
2 A,C,D,E,H,J,K,L
3 F,I
4 B,G
min = 2
average = 2.5
How Many Copies?
A
Partition 1 Partition 4Partition 3Partition 2
0.1 0.40.30.2
minimum = 2
average = 3
Scores of each part...
How Many Copies?
A
Partition 1 Partition 4Partition 3Partition 2
0.1 0.40.30.2
minimum = 2
average = 3
minimum requirement...
Always keep the last n computed scores.
Comparing against Past Scores
0.220.290.30.40.870.9 0.2 0.11 0.1
High Low
... ... ...
Comparing against Past Scores
0.220.290.30.40.870.9 0.2 0.11 0.1
High Low
... ... ... ... ....
minimum = 2
average = 3
30t...
Comparing against Past Scores
0.220.290.30.40.870.9 0.2 0.11 0.1
High Low
... ... ... ... ....
minimum = 2
average = 3
30t...
Comparing against Past Scores
0.220.290.30.40.870.9 0.2 0.11 0.1
High Low
... ... ... ... ....
minimum = 2
average = 3
30t...
Comparing against Past Scores
0.220.290.30.40.870.9 0.2 0.11 0.1
High Low
... ... ... ... ....
minimum = 2
average = 3
30t...
Outline
Background and Motivation
Leopard
Experiments
Experiment Setup
● Comparison points
○ Leopard with FENNEL heustitics
○ One-pass FENNEL (no vertex reassignment)
○ METIS (...
Edge Cut
Computation Skipping
Effect of Replication on Edge Cut
Thanks!
Q & A
Prochain SlideShare
Chargement dans…5
×

Leopard: Lightweight Partitioning and Replication for Dynamic Graphs

1 187 vues

Publié le

This talk was given by Daniel Abadi at VLDB 2016

Publié dans : Technologie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Leopard: Lightweight Partitioning and Replication for Dynamic Graphs

  1. 1. Leopard: Lightweight Partitioning and Replication for Dynamic Graphs Jiewen Huang and Daniel Abadi Yale University
  2. 2. Facebook Social Graph
  3. 3. Social Graphs
  4. 4. Web Graphs
  5. 5. Semantic Graphs
  6. 6. Many systems use hash partitioning ● Results in many edges being “cut” Given a graph G and an integer k, partition the vertices into k disjoint sets such that: ● as few cuts as possible ● as balanced as possible Graph Partitioning NP Hard
  7. 7. Multilevel scheme Coarsening phase State of the Art
  8. 8. The only constant is change. -------- Heraclitus To Make the Problem more Complicated Social graphs: new people and friendships Semantic Web graphs: new knowledge Web graphs: new websites and links
  9. 9. Dynamic Graphs A Partition 1 Partition 2 Is partition 1 still the better partition for A?
  10. 10. Repartitioning the entire graph upon every change is way too expensive New Framework Leopard: ● Locally reassess partitioning as a result of changes without a full re-partitioning ● Integrates consideration of replication with partitioning
  11. 11. Outline Background and Motivation LEOPARD Overview Computation Skipping Replication Experiments
  12. 12. Algorithm Overview For each added/deleted edge <V1, V2> Compute best partition for V1 using a heuristic Re-assign V1 if needed The same for V2
  13. 13. Example: Adding an Edge A B Partition 1 Partition 2
  14. 14. Compute the Partition for B A B Partition 1 Partition 2# neighbours: 1 # vertices: 5 # neighbours: 3 # vertices: 3 Goals: (1) few cuts and (2) balanced Heuristic: # neighbours * (1 - #vertices/capacity) 1 * (1 - 5/6) = 0.17 3 * (1 - 3/6) = 1.5 Higher score This heuristic is simple for the sake of presentation. More advanced heuristics are discussed in the paper
  15. 15. Compute the Partition for A A B Partition 1 Partition 2# neighbours: 1 # vertices: 4 # neighbours: 2 # vertices: 4 Goals: (1) few cuts and (2) balanced Heuristic: # neighbours * (1 - #vertices/capacity) 1 * (1 - 4/6) = 0.33 2 * (1 - 4/6) = 0.66 Higher score
  16. 16. Example: Adding an Edge B Partition 1 Partition 2 A (1) B stays put (2) A moves to partition 2
  17. 17. Outline Background and Motivation Leopard Overview Computation Skipping Replication Experiments
  18. 18. Computation cost For each new edge, must: For both vertexes involved in the edge: Calculate the heuristic for each partition (May involve communication for remote vertex location lookup)
  19. 19. Computation Skipping Observation: As the number of neighbors of a vertex increases, the influence of a new neighbor decreases.
  20. 20. Computation Skipping Basic Idea: Accumulate changes for a vertex, if the changes exceed a certain threshold, recompute the partition for the vertex. For example, threshold = # accumulated changes / # neighbors = 20%. (1) Compute the partition when V has 10 neighbors. Then 2 new edges are added for V: 2 / 12 = 17% < 20%. Don’t recompute (2) When 1 more new edge is added for V: 3 / 13 = 23% > 20%. Recompute the partition for V. Reset # accumulated changes to 0.
  21. 21. Outline Background and Motivation Leopard Overview Computation Skipping Replication Experiments
  22. 22. Goals of replication: fault tolerance (k copies for each data point/block) further cut reduction Replication
  23. 23. It takes two parameters: ● minimum: fault tolerance ● average: cut reduction Minimum-Average Replication
  24. 24. Example # copies vertices 2 A,C,D,E,H,J,K,L 3 F,I 4 B,G min = 2 average = 2.5 first copy replica
  25. 25. Example # copies vertices 2 A,C,D,E,H,J,K,L 3 F,I 4 B,G min = 2 average = 2.5
  26. 26. How Many Copies? A Partition 1 Partition 4Partition 3Partition 2 0.1 0.40.30.2 minimum = 2 average = 3 Scores of each partition
  27. 27. How Many Copies? A Partition 1 Partition 4Partition 3Partition 2 0.1 0.40.30.2 minimum = 2 average = 3 minimum requirementWhat about them?
  28. 28. Always keep the last n computed scores. Comparing against Past Scores 0.220.290.30.40.870.9 0.2 0.11 0.1 High Low ... ... ... ... .... minimum = 2 average = 3 cutoff: top avg-1/k-1 percent of scores
  29. 29. Comparing against Past Scores 0.220.290.30.40.870.9 0.2 0.11 0.1 High Low ... ... ... ... .... minimum = 2 average = 3 30th 31th # copies: 2 cutoff: 30th highest score
  30. 30. Comparing against Past Scores 0.220.290.30.40.870.9 0.2 0.11 0.1 High Low ... ... ... ... .... minimum = 2 average = 3 30th 31th # copies: 2 cutoff: 30th highest score
  31. 31. Comparing against Past Scores 0.220.290.30.40.870.9 0.2 0.11 0.1 High Low ... ... ... ... .... minimum = 2 average = 3 30th 31th # copies: 3 cutoff: 30th highest score
  32. 32. Comparing against Past Scores 0.220.290.30.40.870.9 0.2 0.11 0.1 High Low ... ... ... ... .... minimum = 2 average = 3 30th # copies: 4 cutoff: 30th highest score
  33. 33. Outline Background and Motivation Leopard Experiments
  34. 34. Experiment Setup ● Comparison points ○ Leopard with FENNEL heustitics ○ One-pass FENNEL (no vertex reassignment) ○ METIS (static graphs) ○ ParMETIS (repartitioning for dynamic graphs) ○ Hash Partitioning ● Graph Datasets ○ Type: social graphs, collaboration graphs, Web graphs, email graphs, and synthetic graphs ○ Size: up to 66 million vertices and 1.8 billion edges
  35. 35. Edge Cut
  36. 36. Computation Skipping
  37. 37. Effect of Replication on Edge Cut
  38. 38. Thanks! Q & A

×