A presentation slides of Jihoon Ko*, Yunbum Kook* and Kijung Shin, "Incremental Lossless Graph Summarization", KDD 2020.
Given a fully dynamic graph, represented as a stream of edge insertions and deletions, how can we obtain and incrementally update a lossless summary of its current snapshot?
As large-scale graphs are prevalent, concisely representing them is inevitable for efficient storage and analysis. Lossless graph summarization is an effective graph-compression technique with many desirable properties. It aims to compactly represent the input graph as (a) a summary graph consisting of supernodes (i.e., sets of nodes) and superedges (i.e., edges between supernodes), which provide a rough description, and (b) edge corrections which fix errors induced by the rough description. While a number of batch algorithms, suited for static graphs, have been developed for rapid and compact graph summarization, they are highly inefficient in terms of time and space for dynamic graphs, which are common in practice.
In this work, we propose MoSSo, the first incremental algorithm for lossless summarization of fully dynamic graphs. In response to each change in the input graph, MoSSo updates the output representation by repeatedly moving nodes among supernodes. MoSSo decides nodes to be moved and their destinations carefully but rapidly based on several novel ideas. Through extensive experiments on 10 real graphs, we show MoSSo is (a) Fast and 'any time': processing each change in near-constant time (less than 0.1 millisecond), up to 7 orders of magnitude faster than running state-of-the-art batch methods, (b) Scalable: summarizing graphs with hundreds of millions of edges, requiring sub-linear memory during the process, and (c) Effective: achieving comparable compression ratios even to state-of-the-art batch methods.
2. Large-scale Graphs are Everywhere!
Icon made by Freepik from www.flaticon.com
2B+ active users
600M+ users
1.5B+ users
3. Large-scale Graphs are Everywhere! (cont.)
4B+ web pages 5M papers 6K+ proteins
Icon made by Freepik from www.flaticon.com
4. Graph Compression for Efficient Manipulation
• Handling large-scale graphs as they are...
heavy disk or network I/O
5. Graph Compression for Efficient Manipulation
• Handling large-scale graphs as they are...
heavy disk or network I/O
• Their compact representation makes possible efficient manipulation!
6. Graph Compression for Efficient Manipulation
• Handling large-scale graphs as they are...
heavy disk or network I/O
• Their compact representation makes possible efficient manipulation!
• A larger portion of original graphs can be stored in main memory or cache
7. Previous Graph Compression Techniques
• Various compression techniques have been proposed
• Relabeling nodes
• Pattern mining
• Lossless graph summarization One of the most effective compression techniques
• …
8. Previous Graph Compression Techniques
• Various compression techniques have been proposed
• Relabeling nodes
• Pattern mining
• Lossless graph summarization One of the most effective compression techniques
• …
9. Previous Graph Compression Techniques
• Various compression techniques have been proposed
• Relabeling nodes
• Pattern mining
• Lossless graph summarization One of the most effective compression techniques
• …
• Lossless graph summarization is a batch algorithm for “static graphs”,
which indicate a single or a few snapshots of evolving graphs
10. Previous Graph Compression Techniques
• Various compression techniques have been proposed
• Relabeling nodes
• Pattern mining
• Lossless graph summarization One of the most effective compression techniques
• …
• Lossless graph summarization is a batch algorithm for “static graphs”,
which indicate a single or a few snapshots of evolving graphs
However, most real-world graphs
go through lots of changes in fact...
12. Real-world Graphs are Evolving
2B+ users2M+ users
10 years
Previous algorithms: not designed to allow for changes in graphs
Algorithms should be rerun from scratch to reflect changes
13. Real-world Graphs are Evolving
2B+ users2M+ users
10 years
Previous algorithms: not designed to allow for changes in graphs
Algorithms should be rerun from scratch to reflect changes
Solution: Incrementally update compressed graphs in fast and
effective manners!
40. Why Lossless Graph Summarization?
• Queryable (Retrieving the neighborhood of a query node)
• Queryability: key building blocks in numerous graph algorithms
(ex: DFS, PageRank, Dijkstra’s, etc)
• Rapidly done from a summary and corrections
𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖
Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+
, 𝑪𝑪−
)
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
41. Why Lossless Graph Summarization?
• Queryable (Retrieving the neighborhood of a query node)
• Queryability: key building blocks in numerous graph algorithms
(ex: DFS, PageRank, Dijkstra’s, etc)
• Rapidly done from a summary and corrections
• Combinable
• Its outputs are also graphs further compressed via other compression techniques!
𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖
Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+
, 𝑪𝑪−
)
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
42. Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −
The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕
43. Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −
The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕
Stream of changes:
44. Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −
The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕
Stream of changes:
Empty graph 𝑮𝑮𝟎𝟎
Time 𝒕𝒕 = 𝟎𝟎
45. Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −
The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕
……Stream of changes:
+ - - + +
Empty graph 𝑮𝑮𝟎𝟎
Time 𝒕𝒕 = 𝟎𝟎
46. Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −
The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕
……Stream of changes:
+ - - + +
Empty graph 𝑮𝑮𝟎𝟎
Time 𝒕𝒕 = 𝟎𝟎
Current graph 𝑮𝑮𝒕𝒕
Time 𝒕𝒕
47. Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −
The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕
……Stream of changes:
+ - - + -+
……
Empty graph 𝑮𝑮𝟎𝟎
Time 𝒕𝒕 = 𝟎𝟎
Current graph 𝑮𝑮𝒕𝒕
Time 𝒕𝒕
48. Problem Formulation
• Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
• Retain Summary graph 𝑮𝑮𝒕𝒕
∗
= 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕
+
, 𝑪𝑪𝒕𝒕
−
)
of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕
• To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕
+
+ 𝑪𝑪𝒕𝒕
−
49. Problem Formulation
• Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
• Retain Summary graph 𝑮𝑮𝒕𝒕
∗
= 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕
+
, 𝑪𝑪𝒕𝒕
−
)
of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕
• To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕
+
+ 𝑪𝑪𝒕𝒕
−
Retained at time 𝒕𝒕
𝐶𝐶+
= 𝑎𝑎𝑎𝑎
𝐶𝐶− = 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗
= (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+
, 𝑪𝑪−
)
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
50. Problem Formulation
• Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
• Retain Summary graph 𝑮𝑮𝒕𝒕
∗
= 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕
+
, 𝑪𝑪𝒕𝒕
−
)
of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕
• To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕
+
+ 𝑪𝑪𝒕𝒕
−
+
Retained at time 𝒕𝒕
Edge change 𝒆𝒆𝒕𝒕+𝟏𝟏
𝐶𝐶+
= 𝑎𝑎𝑎𝑎
𝐶𝐶− = 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗
= (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+
, 𝑪𝑪−
)
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
51. Problem Formulation
• Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
• Retain Summary graph 𝑮𝑮𝒕𝒕
∗
= 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕
+
, 𝑪𝑪𝒕𝒕
−
)
of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕
• To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕
+
+ 𝑪𝑪𝒕𝒕
−
+
Retained at time 𝒕𝒕
Edge change 𝒆𝒆𝒕𝒕+𝟏𝟏
𝐶𝐶+
= 𝑎𝑎𝑎𝑎
𝐶𝐶− = 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗
= (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+
, 𝑪𝑪−
)
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
52. Problem Formulation
• Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
• Retain Summary graph 𝑮𝑮𝒕𝒕
∗
= 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕
+
, 𝑪𝑪𝒕𝒕
−
)
of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕
• To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕
+
+ 𝑪𝑪𝒕𝒕
−
+ -+
……
Retained at time 𝒕𝒕
Edge change 𝒆𝒆𝒕𝒕+𝟏𝟏
𝐶𝐶+
= 𝑎𝑎𝑎𝑎
𝐶𝐶− = 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗
= (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+
, 𝑪𝑪−
)
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
69. MoSSo: Details (Step 1) – MCMC
Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected
𝑢𝑢
𝑣𝑣
Changed edge
𝑒𝑒 = {𝑢𝑢, 𝑣𝑣}
𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖
Notation
70. MoSSo: Details (Step 1) – MCMC
Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected
Focus on testings nodes in 𝑵𝑵(𝒖𝒖)𝑢𝑢
𝑣𝑣
Changed edge
𝑒𝑒 = {𝑢𝑢, 𝑣𝑣}
𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖
Notation
71. MoSSo: Details (Step 1) – MCMC
Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected
Focus on testings nodes in 𝑵𝑵(𝒖𝒖)
P1. To sample neighbors, one should retrieve all 𝑵𝑵(𝒖𝒖) from
𝑮𝑮∗
and 𝑪𝑪, which takes 𝑶𝑶(𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) time on average
𝑢𝑢
𝑣𝑣
Changed edge
𝑒𝑒 = {𝑢𝑢, 𝑣𝑣}
𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖
Notation
72. MoSSo: Details (Step 1) – MCMC
Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected
Focus on testings nodes in 𝑵𝑵(𝒖𝒖)
P1. To sample neighbors, one should retrieve all 𝑵𝑵(𝒖𝒖) from
𝑮𝑮∗
and 𝑪𝑪, which takes 𝑶𝑶(𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) time on average
Deadly to scalability…
𝑢𝑢
𝑣𝑣
Changed edge
𝑒𝑒 = {𝑢𝑢, 𝑣𝑣}
𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖
Notation
73. MoSSo: Details (Step 1) – MCMC
Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected
Focus on testings nodes in 𝑵𝑵(𝒖𝒖)
P1. To sample neighbors, one should retrieve all 𝑵𝑵(𝒖𝒖) from
𝑮𝑮∗
and 𝑪𝑪, which takes 𝑶𝑶(𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) time on average
Deadly to scalability…
𝑢𝑢
𝑣𝑣
Changed edge
𝑒𝑒 = {𝑢𝑢, 𝑣𝑣}
Graph densification law [LKF05]:
“The average degree of real-world graphs increases over time.”
𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖
Notation
74. MoSSo: Details (Step 1) – MCMC (cont.)
S1. Without full retrievals of 𝑵𝑵(𝒖𝒖), sample 𝒄𝒄 neighbors in un
iformly random by using Markov Chain Monte Carlo method
(MCMC)
MCMC method: sampling from a random variable with its
probability density proportional to a given function
𝑢𝑢
𝑣𝑣
75. MoSSo: Details (Step 1) – Probabilistic Filtering
Test all the sampled nodes?
𝑣𝑣
𝑢𝑢
76. MoSSo: Details (Step 1) – Probabilistic Filtering
Test all the sampled nodes? Better not…
𝑣𝑣
𝑢𝑢
77. MoSSo: Details (Step 1) – Probabilistic Filtering
Test all the sampled nodes?
P2. Too frequent testing on high-degree nodes
as 𝑃𝑃(𝑣𝑣 sampled) ∝ 𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣)
Better not…
𝑣𝑣
𝑢𝑢
78. MoSSo: Details (Step 1) – Probabilistic Filtering
Test all the sampled nodes?
P2. Too frequent testing on high-degree nodes
as 𝑃𝑃(𝑣𝑣 sampled) ∝ 𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣)
Computationally
heavy
(Too many nbrs)
Better not…
- Updating the optimal encoding
- Computing the change 𝛥𝛥𝛥𝛥 in the description cost
𝑣𝑣
𝑢𝑢
79. MoSSo: Details (Step 1) – Probabilistic Filtering
Test all the sampled nodes?
P2. Too frequent testing on high-degree nodes
as 𝑃𝑃(𝑣𝑣 sampled) ∝ 𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣)
Computationally
heavy
(Too many nbrs)
S2. Test a sampled node 𝑣𝑣 w.p.
1
𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣)
(1) Likely to avoid expensive testing on high-degree nodes
(2) In expectation, 𝑃𝑃(𝑣𝑣: actually tested) is the same across
all nodes 𝑣𝑣 (i.e., smoothen unbalance in # of testing)
Better not…
- Updating the optimal encoding
- Computing the change 𝛥𝛥𝛥𝛥 in the description cost
𝑣𝑣
𝑢𝑢
81. MoSSo: Details (Step 2) – Coarse Clustering
P3. Among many choices, how do we know ”good” candidates?
Testing
node
𝑦𝑦
(likely resulting in 𝝋𝝋 ↓)
𝑣𝑣
𝑢𝑢
82. MoSSo: Details (Step 2) – Coarse Clustering
P3. Among many choices, how do we know ”good” candidates?
Testing
node
𝑦𝑦
(likely resulting in 𝝋𝝋 ↓)
S3. Utilize an incremental coarse clustering
Desirable: Nodes with “similar connectivity”
in the same cluster
Any incremental coarse clustering with the
desirable property!
𝑣𝑣
𝑢𝑢
83. MoSSo: Details (Step 2) – Coarse Clustering
P3. Among many choices, how do we know ”good” candidates?
Testing
node
𝑦𝑦
(likely resulting in 𝝋𝝋 ↓)
S3. Utilize an incremental coarse clustering
Desirable: Nodes with “similar connectivity”
in the same cluster
Any incremental coarse clustering with the
desirable property!
(1) Fast with the desirable theoretical property:
𝑷𝑷 𝒖𝒖, 𝒗𝒗 ∈ 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄 ∝ 𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱(𝑵𝑵 𝒖𝒖 , 𝑵𝑵 𝒗𝒗 )
⇒ Grouping nodes with similar connectivity
Min-hashing
𝑣𝑣
𝑢𝑢
84. MoSSo: Details (Step 2) – Coarse Clustering
P3. Among many choices, how do we know ”good” candidates?
Testing
node
𝑦𝑦
(likely resulting in 𝝋𝝋 ↓)
S3. Utilize an incremental coarse clustering
Desirable: Nodes with “similar connectivity”
in the same cluster
Any incremental coarse clustering with the
desirable property!
(1) Fast with the desirable theoretical property:
(2) Clusters from min-hashing: updated rapidly in response to edge changes
𝑷𝑷 𝒖𝒖, 𝒗𝒗 ∈ 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄 ∝ 𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱(𝑵𝑵 𝒖𝒖 , 𝑵𝑵 𝒗𝒗 )
⇒ Grouping nodes with similar connectivity
Min-hashing
𝑣𝑣
𝑢𝑢
86. MoSSo: Details (Step 2) – Separation of Node
P4. In this way, moving nodes only decreases or maintains 𝑺𝑺
Discourage reorganizing supernodes in the long run
𝑦𝑦
𝑣𝑣
Testing
node
𝑢𝑢
87. MoSSo: Details (Step 2) – Separation of Node
P4. In this way, moving nodes only decreases or maintains 𝑺𝑺
Discourage reorganizing supernodes in the long run
𝑦𝑦
S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and
create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆
𝑣𝑣
Testing
node
𝑢𝑢
88. MoSSo: Details (Step 2) – Separation of Node
P4. In this way, moving nodes only decreases or maintains 𝑺𝑺
Discourage reorganizing supernodes in the long run
𝑦𝑦
S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and
create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆
𝑣𝑣
Testing
node
𝑢𝑢
89. MoSSo: Details (Step 2) – Separation of Node
P4. In this way, moving nodes only decreases or maintains 𝑺𝑺
Discourage reorganizing supernodes in the long run
𝑦𝑦
S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and
create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆
Inject flexibility to supernodes (a partition of 𝑽𝑽)
Empirically significant improvement in compression rates
𝑣𝑣
Testing
node
𝑢𝑢
90. MoSSo: Details (Step 2) – Separation of Node
P4. In this way, moving nodes only decreases or maintains 𝑺𝑺
Discourage reorganizing supernodes in the long run
𝑦𝑦
S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and
create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆
Inject flexibility to supernodes (a partition of 𝑽𝑽)
Empirically significant improvement in compression rates
Similar to before,
accept or reject the separation depending on Δ𝝋𝝋
𝑣𝑣
Testing
node
𝑢𝑢
96. Experimental Settings
• 10 Real-world Graphs (up to 0.3B edges)
Web Social Collaboration Email And others!
97. Experimental Settings
• 10 Real-world Graphs (up to 0.3B edges)
• Batch loseless graph summarization algorithms:
• Randomized [NSR08], SAGS [KNL15], SWeG [SGKR19]
Web Social Collaboration Email And others!
98. Baseline Incremental Algorithms
• MoSSo-Greedy:
• Greedily moves nodes related to inserted/deleted edge, while fixing the
other nodes so that the objective is minimized
• MoSSo-MCMC
• See the paper for details
• MoSSo-Simple
• MoSSo without coarse clustering
99. Experiment results: Speed
• MoSSo processed each change up to 7 orders of magnitude faster
than running the fastest batch algorithm
100. Experiment results: Speed
• MoSSo processed each change up to 7 orders of magnitude faster
than running the fastest batch algorithm
UK (Insertion-only)
101. Experiment results: Speed
• MoSSo processed each change up to 7 orders of magnitude faster
than running the fastest batch algorithm
Insertion-only graph streams
Fully-dynamic graph streams
UK (Insertion-only)
102. Experiment results: Compression Performance
• The compression ratio of MoSSo was even comparable to those of the
best batch algorithms
• MoSSo achieved the best compression ratios among the streaming
algorithms
Compression ratio: ( 𝑷𝑷 + 𝑪𝑪+ + 𝑪𝑪− )/|𝑬𝑬|
Notation
103. Experiment results: Compression Performance
• The compression ratio of MoSSo was even comparable to those of the
best batch algorithms
• MoSSo achieved the best compression ratios among the streaming
algorithms
UK
Compression ratio: ( 𝑷𝑷 + 𝑪𝑪+ + 𝑪𝑪− )/|𝑬𝑬|
Notation
104. Experiment results: Compression Performance
• The compression ratio of MoSSo was even comparable to those of the
best batch algorithms
• MoSSo achieved the best compression ratios among the streaming
algorithms
PR EN FB
DB YT SK
LJ EU HW
UK
Compression ratio: ( 𝑷𝑷 + 𝑪𝑪+ + 𝑪𝑪− )/|𝑬𝑬|
Notation
109. Conclusions
Fast and ‘any time’
We propose MoSSo, the first algorithm for incremental lossless graph summarization
110. Conclusions
Fast and ‘any time’ Effective
We propose MoSSo, the first algorithm for incremental lossless graph summarization
111. Conclusions
Fast and ‘any time’ Effective Scalable
We propose MoSSo, the first algorithm for incremental lossless graph summarization
112. Conclusions
Fast and ‘any time’ Effective Scalable
The code and datasets used in the paper
are available at http://dmlab.kaist.ac.kr/mosso/
We propose MoSSo, the first algorithm for incremental lossless graph summarization