SlideShare une entreprise Scribd logo
1  sur  113
Télécharger pour lire hors ligne
Incremental Lossless
Graph Summarization
Jihoon Ko* Yunbum Kook* Kijung Shin
Large-scale Graphs are Everywhere!
Icon made by Freepik from www.flaticon.com
2B+ active users
600M+ users
1.5B+ users
Large-scale Graphs are Everywhere! (cont.)
4B+ web pages 5M papers 6K+ proteins
Icon made by Freepik from www.flaticon.com
Graph Compression for Efficient Manipulation
• Handling large-scale graphs as they are...
 heavy disk or network I/O
Graph Compression for Efficient Manipulation
• Handling large-scale graphs as they are...
 heavy disk or network I/O
• Their compact representation makes possible efficient manipulation!
Graph Compression for Efficient Manipulation
• Handling large-scale graphs as they are...
 heavy disk or network I/O
• Their compact representation makes possible efficient manipulation!
• A larger portion of original graphs can be stored in main memory or cache
Previous Graph Compression Techniques
• Various compression techniques have been proposed
• Relabeling nodes
• Pattern mining
• Lossless graph summarization  One of the most effective compression techniques
• …
Previous Graph Compression Techniques
• Various compression techniques have been proposed
• Relabeling nodes
• Pattern mining
• Lossless graph summarization  One of the most effective compression techniques
• …
Previous Graph Compression Techniques
• Various compression techniques have been proposed
• Relabeling nodes
• Pattern mining
• Lossless graph summarization  One of the most effective compression techniques
• …
• Lossless graph summarization is a batch algorithm for “static graphs”,
which indicate a single or a few snapshots of evolving graphs
Previous Graph Compression Techniques
• Various compression techniques have been proposed
• Relabeling nodes
• Pattern mining
• Lossless graph summarization  One of the most effective compression techniques
• …
• Lossless graph summarization is a batch algorithm for “static graphs”,
which indicate a single or a few snapshots of evolving graphs
However, most real-world graphs
go through lots of changes in fact...
Real-world Graphs are Evolving
2B+ users2M+ users
10 years
Real-world Graphs are Evolving
2B+ users2M+ users
10 years
Previous algorithms: not designed to allow for changes in graphs
Algorithms should be rerun from scratch to reflect changes
Real-world Graphs are Evolving
2B+ users2M+ users
10 years
Previous algorithms: not designed to allow for changes in graphs
Algorithms should be rerun from scratch to reflect changes
Solution: Incrementally update compressed graphs in fast and
effective manners!
Outline
• Preliminaries
• Proposed Algorithm: MoSSo
• Experimental Results
• Conclusions
Lossless Graph Summarization: Example
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
Input graph with 𝟏𝟏𝟏𝟏 edges
Lossless Graph Summarization: Example
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
Input graph with 𝟏𝟏𝟏𝟏 edges
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
Lossless Graph Summarization: Example
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
Input graph with 𝟏𝟏𝟏𝟏 edges
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
Delete {𝑓𝑓, 𝑖𝑖}
Lossless Graph Summarization: Example
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
Input graph with 𝟏𝟏𝟏𝟏 edges
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
Add {𝑎𝑎, 𝑓𝑓} & Delete {𝑓𝑓, 𝑖𝑖}
Delete {𝑓𝑓, 𝑖𝑖}
Lossless Graph Summarization: Example
Add {𝑎𝑎, 𝑓𝑓} & Delete {𝑓𝑓, 𝑖𝑖}
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
Output with 𝟒𝟒 edges
Input graph with 𝟏𝟏𝟏𝟏 edges
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
Add {𝑎𝑎, 𝑓𝑓} & Delete {𝑓𝑓, 𝑖𝑖}
Delete {𝑓𝑓, 𝑖𝑖}
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Lossless Graph Summarization: Definition
Lossless
Summarization
Lossless summarization yields (1) a summary graph and (2) edge corrections,
while minimizing the edge count 𝑷𝑷 + 𝑪𝑪+
+ |𝑪𝑪−
|
(≈ “description cost” denoted by 𝝋𝝋)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
Lossless Graph Summarization: Definition
Lossless
Summarization
Lossless summarization yields (1) a summary graph and (2) edge corrections,
while minimizing the edge count 𝑷𝑷 + 𝑪𝑪+
+ |𝑪𝑪−
|
(≈ “description cost” denoted by 𝝋𝝋)
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Lossless Graph Summarization: Definition
Lossless
Summarization
Lossless summarization yields (1) a summary graph and (2) edge corrections,
while minimizing the edge count 𝑷𝑷 + 𝑪𝑪+
+ |𝑪𝑪−
|
(≈ “description cost” denoted by 𝝋𝝋)
𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Lossless Graph Summarization: Definition
Lossless
Summarization
Lossless summarization yields (1) a summary graph and (2) edge corrections,
while minimizing the edge count 𝑷𝑷 + 𝑪𝑪+
+ |𝑪𝑪−
|
(≈ “description cost” denoted by 𝝋𝝋)
Proposed in [NRS08]
based on “the Minimum
Description Length principle”
𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Lossless Graph Summarization: Definition
Lossless
Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Lossless Graph Summarization: Definition
1. Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
• Supernodes 𝑺𝑺 = a partition of 𝑽𝑽, where each supernode is a set of nodes
• Superedges 𝑷𝑷 = a set of pairs of supernodes (ex: {𝑨𝑨, 𝑩𝑩} in example above)
Lossless
Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Lossless Graph Summarization: Definition
1. Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
• Supernodes 𝑺𝑺 = a partition of 𝑽𝑽, where each supernode is a set of nodes
• Superedges 𝑷𝑷 = a set of pairs of supernodes (ex: {𝑨𝑨, 𝑩𝑩} in example above)
Lossless
Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Supernode
Lossless Graph Summarization: Definition
1. Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
• Supernodes 𝑺𝑺 = a partition of 𝑽𝑽, where each supernode is a set of nodes
• Superedges 𝑷𝑷 = a set of pairs of supernodes (ex: {𝑨𝑨, 𝑩𝑩} in example above)
Lossless
Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Supernode
Superedge
Lossless Graph Summarization: Definition
1. Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
• Supernodes 𝑺𝑺 = a partition of 𝑽𝑽, where each supernode is a set of nodes
• Superedges 𝑷𝑷 = a set of pairs of supernodes (ex: {𝑨𝑨, 𝑩𝑩} in example above)
2. Edge corrections (𝑪𝑪+, 𝑪𝑪−)
• Residual graph (Positive) 𝑪𝑪+
• Residual graph (Negative) 𝑪𝑪−
Lossless
Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Supernode
Superedge
Lossless Graph Summarization: Notation
Supernode containing 𝒖𝒖
Edges between supernodes 𝑨𝑨 and 𝑩𝑩
All possible edges between 𝑨𝑨 and 𝑩𝑩
Neighborhood of a node 𝒖𝒖
Nodes incident to 𝒖𝒖 in 𝑪𝑪+ (or 𝑪𝑪−)
Compression rate
: 𝐒𝐒𝒖𝒖 (i.e. 𝒖𝒖 ∈ 𝑺𝑺𝒖𝒖)
: 𝑬𝑬𝑨𝑨𝑨𝑨 = {𝒖𝒖𝒖𝒖 ∈ 𝑬𝑬 ∶ 𝒖𝒖 ∈ 𝑨𝑨, 𝒗𝒗 ∈ 𝑩𝑩 (𝒖𝒖 ≠ 𝒗𝒗)}
: 𝑻𝑻𝑨𝑨𝑨𝑨 = {𝒖𝒖𝒖𝒖 ⊆ 𝑽𝑽: 𝒖𝒖 ∈ 𝑨𝑨, 𝒗𝒗 ∈ 𝑩𝑩 (𝒖𝒖 ≠ 𝒗𝒗)}
: 𝑵𝑵 𝒖𝒖 = {𝒗𝒗 ∈ 𝑽𝑽 ∶ 𝒖𝒖𝒖𝒖 ∈ 𝑬𝑬}
: 𝑪𝑪+(𝒖𝒖) (or 𝑪𝑪−(𝒖𝒖))
: ( 𝑷𝑷 + 𝑪𝑪+
+ 𝑪𝑪−
)/|𝑬𝑬|
Lossless
Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Lossless Graph Summarization: Optimal Encoding
For summarization, determining supernodes 𝑺𝑺 (a partition of 𝑽𝑽) is our main concern
 For given 𝑺𝑺, superedges 𝑷𝑷 and edge corrections 𝑪𝑪 are optimally determined
Lossless
Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Lossless Graph Summarization: Optimal Encoding
Edges 𝑬𝑬𝑨𝑨𝑨𝑨 between two supernodes:
(1) a superedge with 𝑪𝑪− or (2) no superedge with 𝑪𝑪+
Case 1: 𝑬𝑬𝑨𝑨𝑨𝑨 ≥
𝑻𝑻𝑨𝑨𝑨𝑨 +𝟏𝟏
𝟐𝟐
: add superedge 𝑨𝑨𝑨𝑨 to 𝑷𝑷 and 𝑻𝑻𝑨𝑨𝑨𝑨𝑬𝑬𝑨𝑨𝑨𝑨 to 𝑪𝑪−
Case 2: 𝑬𝑬𝑨𝑨𝑨𝑨 <
𝑻𝑻𝑨𝑨𝑨𝑨 +𝟏𝟏
𝟐𝟐
: add all edges in 𝑬𝑬𝑨𝑨𝑨𝑨 to 𝑪𝑪+
Costs: |𝐄𝐄𝐀𝐀𝐀𝐀|Costs: 𝟏𝟏 + 𝐓𝐓𝑨𝑨𝑨𝑨 − |𝐄𝐄𝐀𝐀𝐀𝐀|
Lossless
Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
𝑬𝑬𝑨𝑨𝑨𝑨: Edges between supernodes 𝑨𝑨 and 𝑩𝑩
𝑻𝑻𝑨𝑨𝑨𝑨: All possible edges between 𝑨𝑨 and 𝑩𝑩
Notation
Lossless Graph Summarization: Optimal Encoding
Superedge 𝑨𝑨𝑨𝑨
𝝋𝝋 = 𝟐𝟐 + 𝟏𝟏 + 𝟏𝟏 = 𝟒𝟒
𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Lossless Graph Summarization: Optimal Encoding
Superedge 𝑨𝑨𝑨𝑨
𝑪𝑪+
only
𝝋𝝋 = 𝟐𝟐 + 𝟏𝟏 + 𝟏𝟏 = 𝟒𝟒
𝝋𝝋 = 𝟏𝟏 + 𝟓𝟓 + 𝟏𝟏 = 𝟕𝟕
𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
𝐶𝐶+
= 𝑎𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎
𝐶𝐶− = 𝑓𝑓𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Recovery: Example
𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓}
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Recovery: Example
𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} Add all pairs of nodes
between two adjacent
supernodes
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
𝐶𝐶+
= 𝑎𝑎𝑎𝑎 , 𝐶𝐶−
= {𝑓𝑓𝑓𝑓}
𝑒𝑒
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
Recovery: Example
𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} Add all pairs of nodes
between two adjacent
supernodes
Remove all edges
in 𝐂𝐂−
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
𝐶𝐶+
= 𝑎𝑎𝑎𝑎 , 𝐶𝐶−
= {𝑓𝑓𝑓𝑓}
𝑒𝑒
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝑒𝑒
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
Recovery: Example
𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} Add all pairs of nodes
between two adjacent
supernodes
Remove all edges
in 𝐂𝐂−
Add all edges in 𝐂𝐂+
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
𝐶𝐶+
= 𝑎𝑎𝑎𝑎 , 𝐶𝐶−
= {𝑓𝑓𝑓𝑓}
𝑒𝑒
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝑒𝑒
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
Why Lossless Graph Summarization?
𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖
Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+
, 𝑪𝑪−
)
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Why Lossless Graph Summarization?
• Queryable (Retrieving the neighborhood of a query node)
𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖
Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+
, 𝑪𝑪−
)
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Why Lossless Graph Summarization?
• Queryable (Retrieving the neighborhood of a query node)
• Queryability: key building blocks in numerous graph algorithms
(ex: DFS, PageRank, Dijkstra’s, etc)
• Rapidly done from a summary and corrections
𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖
Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+
, 𝑪𝑪−
)
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Why Lossless Graph Summarization?
• Queryable (Retrieving the neighborhood of a query node)
• Queryability: key building blocks in numerous graph algorithms
(ex: DFS, PageRank, Dijkstra’s, etc)
• Rapidly done from a summary and corrections
• Combinable
• Its outputs are also graphs  further compressed via other compression techniques!
𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖
Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+
, 𝑪𝑪−
)
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −
 The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕
Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −
 The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕
Stream of changes:
Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −
 The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕
Stream of changes:
Empty graph 𝑮𝑮𝟎𝟎
Time 𝒕𝒕 = 𝟎𝟎
Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −
 The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕
……Stream of changes:
+ - - + +
Empty graph 𝑮𝑮𝟎𝟎
Time 𝒕𝒕 = 𝟎𝟎
Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −
 The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕
……Stream of changes:
+ - - + +
Empty graph 𝑮𝑮𝟎𝟎
Time 𝒕𝒕 = 𝟎𝟎
Current graph 𝑮𝑮𝒕𝒕
Time 𝒕𝒕
Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −
 The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕
……Stream of changes:
+ - - + -+
……
Empty graph 𝑮𝑮𝟎𝟎
Time 𝒕𝒕 = 𝟎𝟎
Current graph 𝑮𝑮𝒕𝒕
Time 𝒕𝒕
Problem Formulation
• Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
• Retain Summary graph 𝑮𝑮𝒕𝒕
∗
= 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕
+
, 𝑪𝑪𝒕𝒕
−
)
of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕
• To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕
+
+ 𝑪𝑪𝒕𝒕
−
Problem Formulation
• Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
• Retain Summary graph 𝑮𝑮𝒕𝒕
∗
= 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕
+
, 𝑪𝑪𝒕𝒕
−
)
of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕
• To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕
+
+ 𝑪𝑪𝒕𝒕
−
Retained at time 𝒕𝒕
𝐶𝐶+
= 𝑎𝑎𝑎𝑎
𝐶𝐶− = 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗
= (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+
, 𝑪𝑪−
)
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Problem Formulation
• Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
• Retain Summary graph 𝑮𝑮𝒕𝒕
∗
= 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕
+
, 𝑪𝑪𝒕𝒕
−
)
of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕
• To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕
+
+ 𝑪𝑪𝒕𝒕
−
+
Retained at time 𝒕𝒕
Edge change 𝒆𝒆𝒕𝒕+𝟏𝟏
𝐶𝐶+
= 𝑎𝑎𝑎𝑎
𝐶𝐶− = 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗
= (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+
, 𝑪𝑪−
)
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Problem Formulation
• Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
• Retain Summary graph 𝑮𝑮𝒕𝒕
∗
= 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕
+
, 𝑪𝑪𝒕𝒕
−
)
of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕
• To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕
+
+ 𝑪𝑪𝒕𝒕
−
+
Retained at time 𝒕𝒕
Edge change 𝒆𝒆𝒕𝒕+𝟏𝟏
𝐶𝐶+
= 𝑎𝑎𝑎𝑎
𝐶𝐶− = 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗
= (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+
, 𝑪𝑪−
)
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Problem Formulation
• Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
• Retain Summary graph 𝑮𝑮𝒕𝒕
∗
= 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕
+
, 𝑪𝑪𝒕𝒕
−
)
of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕
• To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕
+
+ 𝑪𝑪𝒕𝒕
−
+ -+
……
Retained at time 𝒕𝒕
Edge change 𝒆𝒆𝒕𝒕+𝟏𝟏
𝐶𝐶+
= 𝑎𝑎𝑎𝑎
𝐶𝐶− = 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗
= (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+
, 𝑪𝑪−
)
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Challenge: Fast Update but Good Performance
Outline
• Preliminaries
• Proposed Algorithm: MoSSo
• Experimental Results
• Conclusions
Scheme for Incremental Summarization
Current graph Lossless summarization
𝐴𝐴
𝑎𝑎
𝐶𝐶
𝐺𝐺∗
𝑎𝑎
𝑏𝑏
𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐵𝐵
𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑
𝑖𝑖
𝑓𝑓 𝑔𝑔
ℎ
𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+
| + |𝐶𝐶−
| = 4
𝐶𝐶+ 𝐶𝐶−
𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖
Scheme for Incremental Summarization
Current graph Lossless summarization
𝐴𝐴
𝑎𝑎
𝐶𝐶
𝐺𝐺∗
𝑎𝑎
𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐵𝐵
𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑
𝑖𝑖
𝑓𝑓 𝑔𝑔
ℎ
New edge:
𝑗𝑗𝑏𝑏
𝑎𝑎 𝑗𝑗
𝑗𝑗
𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+
| + |𝐶𝐶−
| = 5
𝑎𝑎 𝑓𝑓
𝑓𝑓𝑖𝑖
𝑎𝑎 𝑗𝑗
𝐶𝐶+ 𝐶𝐶−
Scheme for Incremental Summarization
Current graph Lossless summarization
𝐴𝐴
𝑎𝑎
𝐶𝐶
𝐺𝐺∗
𝑎𝑎
𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐵𝐵
𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑
𝑖𝑖
𝑓𝑓 𝑔𝑔
ℎ
New edge:
𝑗𝑗𝑏𝑏
How to update
current summarization?
𝑎𝑎 𝑗𝑗
𝑗𝑗
𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+
| + |𝐶𝐶−
| = 5
𝑎𝑎 𝑓𝑓
𝑓𝑓𝑖𝑖
𝑎𝑎 𝑗𝑗
𝐶𝐶+ 𝐶𝐶−
Scheme for Incremental Summarization
Our approach
(1) Attempt to move nodes
among supernodes
(2) Accept the move if 𝝋𝝋
decreases
(3) Reject otherwise
Current graph Lossless summarization
𝐴𝐴
𝑎𝑎
𝐶𝐶
𝐺𝐺∗
𝑎𝑎
𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐵𝐵
𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑
𝑖𝑖
𝑓𝑓 𝑔𝑔
ℎ
𝑗𝑗𝑏𝑏
𝑗𝑗
𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+
| + |𝐶𝐶−
| = 5
𝑎𝑎 𝑓𝑓
𝑓𝑓𝑖𝑖
𝑎𝑎 𝑗𝑗
𝐶𝐶+ 𝐶𝐶−
Scheme for Incremental Summarization
Our approach
(1) Attempt to move nodes
among supernodes
(2) Accept the move if 𝝋𝝋
decreases
(3) Reject otherwise
Testing node
Current graph Lossless summarization
𝐴𝐴
𝑎𝑎
𝐶𝐶
𝐺𝐺∗
𝑎𝑎
𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐵𝐵
𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑
𝑖𝑖
𝑓𝑓 𝑔𝑔
ℎ
𝑗𝑗𝑏𝑏
𝑗𝑗
𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+
| + |𝐶𝐶−
| = 5
𝑎𝑎 𝑓𝑓
𝑓𝑓𝑖𝑖
𝑎𝑎 𝑗𝑗
𝐶𝐶+ 𝐶𝐶−
Scheme for Incremental Summarization
Our approach
(1) Attempt to move nodes
among supernodes
(2) Accept the move if 𝝋𝝋
decreases
(3) Reject otherwise
Testing node
Current graph Lossless summarization
𝐴𝐴
𝑎𝑎
𝐶𝐶
𝐺𝐺∗
𝑎𝑎
𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐵𝐵
𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑
𝑖𝑖
𝑓𝑓 𝑔𝑔
ℎ
𝑗𝑗𝑏𝑏
𝑗𝑗
Candidate
𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+
| + |𝐶𝐶−
| = 5
𝑎𝑎 𝑓𝑓
𝑓𝑓𝑖𝑖
𝑎𝑎 𝑗𝑗
𝐶𝐶+ 𝐶𝐶−
Scheme for Incremental Summarization
Our approach
(1) Attempt to move nodes
among supernodes
(2) Accept the move if 𝝋𝝋
decreases
(3) Reject otherwise
Testing node
Testing
Current graph Lossless summarization
𝐴𝐴
𝑎𝑎
𝐶𝐶
𝐺𝐺∗
𝑎𝑎
𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐵𝐵
𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑
𝑖𝑖
𝑓𝑓 𝑔𝑔
ℎ
𝑗𝑗𝑏𝑏
𝑗𝑗
Candidate
𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+
| + |𝐶𝐶−
| = 5
𝑎𝑎 𝑓𝑓
𝑓𝑓𝑖𝑖
𝑎𝑎 𝑗𝑗
𝐶𝐶+ 𝐶𝐶−
Scheme for Incremental Summarization
MoSSo finds...
(1) Testing nodes
whose move likely
results in 𝛗𝛗 ↓
(2) Candidates for
testing node, likely
resulting in 𝛗𝛗 ↓
Current graph Lossless summarization
𝐴𝐴
𝑎𝑎
𝐶𝐶
𝐺𝐺∗
𝑎𝑎
𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐵𝐵
𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑
𝑖𝑖
𝑓𝑓 𝑔𝑔
ℎ
𝑗𝑗𝑏𝑏
𝑗𝑗
𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+
| + |𝐶𝐶−
| = 5
𝑎𝑎 𝑓𝑓
𝑓𝑓𝑖𝑖
𝑎𝑎 𝑗𝑗
𝐶𝐶+ 𝐶𝐶−
Lossless summarization
Scheme for Incremental Summarization
Current graph
𝐴𝐴
𝑎𝑎
𝐶𝐶
𝐺𝐺∗
𝐵𝐵
𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑
𝑖𝑖
𝑓𝑓 𝑔𝑔
ℎ
𝑎𝑎
𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝑗𝑗𝑏𝑏
𝑗𝑗Testing node
Candidate
𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+
| + |𝐶𝐶−
| = 5
𝑎𝑎 𝑓𝑓
𝑓𝑓𝑖𝑖
𝑎𝑎 𝑗𝑗
𝐶𝐶+ 𝐶𝐶−
Scheme for Incremental Summarization
Current graph Lossless summarization
𝐴𝐴
𝑎𝑎
𝐶𝐶
𝐺𝐺∗
𝑎𝑎
𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐵𝐵
𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑
𝑖𝑖
𝑓𝑓 𝑔𝑔
ℎ
𝑗𝑗𝑏𝑏
𝑗𝑗
𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+
| + |𝐶𝐶−
| = 4
𝐶𝐶+ 𝐶𝐶−𝐶𝐶+ 𝐶𝐶−
𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖
MoSSo: Main Ideas
• Step 1: Set testing nodes
• (S1) No restoration from the
current summarization 𝐺𝐺𝑡𝑡
∗
= 𝑆𝑆𝑡𝑡, 𝑃𝑃𝑡𝑡 ,
𝐶𝐶𝑡𝑡 = (𝐶𝐶𝑡𝑡
+
, 𝐶𝐶𝑡𝑡
−
)
• (S2) Reduce redundant testing by a
stochastic filtering
𝑢𝑢 𝑣𝑣
Changed edge
𝑒𝑒 = {𝑢𝑢, 𝑣𝑣}
• Step 2: Find candidate
• (S3) Utilize an incremental
coarse clustering
• (S4) Inject flexibility to
reorganization of supernodes
Which nodes to move?
(testing nodes)
𝑢𝑢 𝑣𝑣
Testing
node
Move into
which supernode?
(candidates)
MoSSo: Main Ideas
• Step 1: Set testing nodes
• (S1) No restoration from the
current summarization 𝐺𝐺𝑡𝑡
∗
= 𝑆𝑆𝑡𝑡, 𝑃𝑃𝑡𝑡 ,
𝐶𝐶𝑡𝑡 = (𝐶𝐶𝑡𝑡
+
, 𝐶𝐶𝑡𝑡
−
)
• (S2) Reduce redundant testing by a
stochastic filtering
Repeated
Time
𝑢𝑢 𝑣𝑣
Changed edge
𝑒𝑒 = {𝑢𝑢, 𝑣𝑣}
• Step 2: Find candidate
• (S3) Utilize an incremental
coarse clustering
• (S4) Inject flexibility to
reorganization of supernodes
Performance
Which nodes to move?
(testing nodes)
𝑢𝑢 𝑣𝑣
Testing
node
Move into
which supernode?
(candidates)
MoSSo: Details
Parameters:
• Sample number 𝒄𝒄
• Escape prob. 𝒆𝒆
Input:
• Summary graph 𝑮𝑮𝒕𝒕
∗
& Edge corrections 𝑪𝑪𝒕𝒕
• Edge change 𝒖𝒖, 𝒗𝒗 + (addition) or 𝒖𝒖, 𝒗𝒗 − (deletion)
Output:
• Summary graph 𝑮𝑮𝒕𝒕+𝟏𝟏
∗
& Edge corrections 𝑪𝑪𝒕𝒕+𝟏𝟏
MoSSo: Details (Step 1) – MCMC
𝑢𝑢
𝑣𝑣
Changed edge
𝑒𝑒 = {𝑢𝑢, 𝑣𝑣}
𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖
Notation
MoSSo: Details (Step 1) – MCMC
Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected
𝑢𝑢
𝑣𝑣
Changed edge
𝑒𝑒 = {𝑢𝑢, 𝑣𝑣}
𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖
Notation
MoSSo: Details (Step 1) – MCMC
Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected
 Focus on testings nodes in 𝑵𝑵(𝒖𝒖)𝑢𝑢
𝑣𝑣
Changed edge
𝑒𝑒 = {𝑢𝑢, 𝑣𝑣}
𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖
Notation
MoSSo: Details (Step 1) – MCMC
Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected
 Focus on testings nodes in 𝑵𝑵(𝒖𝒖)
P1. To sample neighbors, one should retrieve all 𝑵𝑵(𝒖𝒖) from
𝑮𝑮∗
and 𝑪𝑪, which takes 𝑶𝑶(𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) time on average
𝑢𝑢
𝑣𝑣
Changed edge
𝑒𝑒 = {𝑢𝑢, 𝑣𝑣}
𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖
Notation
MoSSo: Details (Step 1) – MCMC
Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected
 Focus on testings nodes in 𝑵𝑵(𝒖𝒖)
P1. To sample neighbors, one should retrieve all 𝑵𝑵(𝒖𝒖) from
𝑮𝑮∗
and 𝑪𝑪, which takes 𝑶𝑶(𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) time on average
 Deadly to scalability…
𝑢𝑢
𝑣𝑣
Changed edge
𝑒𝑒 = {𝑢𝑢, 𝑣𝑣}
𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖
Notation
MoSSo: Details (Step 1) – MCMC
Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected
 Focus on testings nodes in 𝑵𝑵(𝒖𝒖)
P1. To sample neighbors, one should retrieve all 𝑵𝑵(𝒖𝒖) from
𝑮𝑮∗
and 𝑪𝑪, which takes 𝑶𝑶(𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) time on average
 Deadly to scalability…
𝑢𝑢
𝑣𝑣
Changed edge
𝑒𝑒 = {𝑢𝑢, 𝑣𝑣}
Graph densification law [LKF05]:
“The average degree of real-world graphs increases over time.”
𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖
Notation
MoSSo: Details (Step 1) – MCMC (cont.)
S1. Without full retrievals of 𝑵𝑵(𝒖𝒖), sample 𝒄𝒄 neighbors in un
iformly random by using Markov Chain Monte Carlo method
(MCMC)
 MCMC method: sampling from a random variable with its
probability density proportional to a given function
𝑢𝑢
𝑣𝑣
MoSSo: Details (Step 1) – Probabilistic Filtering
Test all the sampled nodes?
𝑣𝑣
𝑢𝑢
MoSSo: Details (Step 1) – Probabilistic Filtering
Test all the sampled nodes?  Better not…
𝑣𝑣
𝑢𝑢
MoSSo: Details (Step 1) – Probabilistic Filtering
Test all the sampled nodes?
P2. Too frequent testing on high-degree nodes
as 𝑃𝑃(𝑣𝑣 sampled) ∝ 𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣)
 Better not…
𝑣𝑣
𝑢𝑢
MoSSo: Details (Step 1) – Probabilistic Filtering
Test all the sampled nodes?
P2. Too frequent testing on high-degree nodes
as 𝑃𝑃(𝑣𝑣 sampled) ∝ 𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣)
Computationally
heavy
(Too many nbrs)
 Better not…
- Updating the optimal encoding
- Computing the change 𝛥𝛥𝛥𝛥 in the description cost
𝑣𝑣
𝑢𝑢
MoSSo: Details (Step 1) – Probabilistic Filtering
Test all the sampled nodes?
P2. Too frequent testing on high-degree nodes
as 𝑃𝑃(𝑣𝑣 sampled) ∝ 𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣)
Computationally
heavy
(Too many nbrs)
S2. Test a sampled node 𝑣𝑣 w.p.
1
𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣)
(1) Likely to avoid expensive testing on high-degree nodes
(2) In expectation, 𝑃𝑃(𝑣𝑣: actually tested) is the same across
all nodes 𝑣𝑣 (i.e., smoothen unbalance in # of testing)
 Better not…
- Updating the optimal encoding
- Computing the change 𝛥𝛥𝛥𝛥 in the description cost
𝑣𝑣
𝑢𝑢
MoSSo: Details (Step 2) – Coarse Clustering
Testing
node
𝑦𝑦
𝑣𝑣
𝑢𝑢
MoSSo: Details (Step 2) – Coarse Clustering
P3. Among many choices, how do we know ”good” candidates?
Testing
node
𝑦𝑦
(likely resulting in 𝝋𝝋 ↓)
𝑣𝑣
𝑢𝑢
MoSSo: Details (Step 2) – Coarse Clustering
P3. Among many choices, how do we know ”good” candidates?
Testing
node
𝑦𝑦
(likely resulting in 𝝋𝝋 ↓)
S3. Utilize an incremental coarse clustering
 Desirable: Nodes with “similar connectivity”
in the same cluster
 Any incremental coarse clustering with the
desirable property!
𝑣𝑣
𝑢𝑢
MoSSo: Details (Step 2) – Coarse Clustering
P3. Among many choices, how do we know ”good” candidates?
Testing
node
𝑦𝑦
(likely resulting in 𝝋𝝋 ↓)
S3. Utilize an incremental coarse clustering
 Desirable: Nodes with “similar connectivity”
in the same cluster
 Any incremental coarse clustering with the
desirable property!
(1) Fast with the desirable theoretical property:
𝑷𝑷 𝒖𝒖, 𝒗𝒗 ∈ 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄 ∝ 𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱(𝑵𝑵 𝒖𝒖 , 𝑵𝑵 𝒗𝒗 )
⇒ Grouping nodes with similar connectivity
Min-hashing
𝑣𝑣
𝑢𝑢
MoSSo: Details (Step 2) – Coarse Clustering
P3. Among many choices, how do we know ”good” candidates?
Testing
node
𝑦𝑦
(likely resulting in 𝝋𝝋 ↓)
S3. Utilize an incremental coarse clustering
 Desirable: Nodes with “similar connectivity”
in the same cluster
 Any incremental coarse clustering with the
desirable property!
(1) Fast with the desirable theoretical property:
(2) Clusters from min-hashing: updated rapidly in response to edge changes
𝑷𝑷 𝒖𝒖, 𝒗𝒗 ∈ 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄 ∝ 𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱(𝑵𝑵 𝒖𝒖 , 𝑵𝑵 𝒗𝒗 )
⇒ Grouping nodes with similar connectivity
Min-hashing
𝑣𝑣
𝑢𝑢
MoSSo: Details (Step 2) – Separation of Node
𝑦𝑦
𝑣𝑣
Testing
node
𝑢𝑢
MoSSo: Details (Step 2) – Separation of Node
P4. In this way, moving nodes only decreases or maintains 𝑺𝑺
 Discourage reorganizing supernodes in the long run
𝑦𝑦
𝑣𝑣
Testing
node
𝑢𝑢
MoSSo: Details (Step 2) – Separation of Node
P4. In this way, moving nodes only decreases or maintains 𝑺𝑺
 Discourage reorganizing supernodes in the long run
𝑦𝑦
S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and
create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆
𝑣𝑣
Testing
node
𝑢𝑢
MoSSo: Details (Step 2) – Separation of Node
P4. In this way, moving nodes only decreases or maintains 𝑺𝑺
 Discourage reorganizing supernodes in the long run
𝑦𝑦
S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and
create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆
𝑣𝑣
Testing
node
𝑢𝑢
MoSSo: Details (Step 2) – Separation of Node
P4. In this way, moving nodes only decreases or maintains 𝑺𝑺
 Discourage reorganizing supernodes in the long run
𝑦𝑦
S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and
create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆
 Inject flexibility to supernodes (a partition of 𝑽𝑽)
 Empirically significant improvement in compression rates
𝑣𝑣
Testing
node
𝑢𝑢
MoSSo: Details (Step 2) – Separation of Node
P4. In this way, moving nodes only decreases or maintains 𝑺𝑺
 Discourage reorganizing supernodes in the long run
𝑦𝑦
S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and
create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆
 Inject flexibility to supernodes (a partition of 𝑽𝑽)
 Empirically significant improvement in compression rates
Similar to before,
accept or reject the separation depending on Δ𝝋𝝋
𝑣𝑣
Testing
node
𝑢𝑢
Outline
• Preliminaries
• Proposed Algorithm: MoSSo
• Experimental Results
• Conclusions
Experimental Settings
• 10 Real-world Graphs (up to 0.3B edges)
Experimental Settings
• 10 Real-world Graphs (up to 0.3B edges)
Web
Experimental Settings
• 10 Real-world Graphs (up to 0.3B edges)
Web Social
Experimental Settings
• 10 Real-world Graphs (up to 0.3B edges)
Web Social Collaboration
Experimental Settings
• 10 Real-world Graphs (up to 0.3B edges)
Web Social Collaboration Email And others!
Experimental Settings
• 10 Real-world Graphs (up to 0.3B edges)
• Batch loseless graph summarization algorithms:
• Randomized [NSR08], SAGS [KNL15], SWeG [SGKR19]
Web Social Collaboration Email And others!
Baseline Incremental Algorithms
• MoSSo-Greedy:
• Greedily moves nodes related to inserted/deleted edge, while fixing the
other nodes so that the objective is minimized
• MoSSo-MCMC
• See the paper for details
• MoSSo-Simple
• MoSSo without coarse clustering
Experiment results: Speed
• MoSSo processed each change up to 7 orders of magnitude faster
than running the fastest batch algorithm
Experiment results: Speed
• MoSSo processed each change up to 7 orders of magnitude faster
than running the fastest batch algorithm
UK (Insertion-only)
Experiment results: Speed
• MoSSo processed each change up to 7 orders of magnitude faster
than running the fastest batch algorithm
Insertion-only graph streams
Fully-dynamic graph streams
UK (Insertion-only)
Experiment results: Compression Performance
• The compression ratio of MoSSo was even comparable to those of the
best batch algorithms
• MoSSo achieved the best compression ratios among the streaming
algorithms
Compression ratio: ( 𝑷𝑷 + 𝑪𝑪+ + 𝑪𝑪− )/|𝑬𝑬|
Notation
Experiment results: Compression Performance
• The compression ratio of MoSSo was even comparable to those of the
best batch algorithms
• MoSSo achieved the best compression ratios among the streaming
algorithms
UK
Compression ratio: ( 𝑷𝑷 + 𝑪𝑪+ + 𝑪𝑪− )/|𝑬𝑬|
Notation
Experiment results: Compression Performance
• The compression ratio of MoSSo was even comparable to those of the
best batch algorithms
• MoSSo achieved the best compression ratios among the streaming
algorithms
PR EN FB
DB YT SK
LJ EU HW
UK
Compression ratio: ( 𝑷𝑷 + 𝑪𝑪+ + 𝑪𝑪− )/|𝑬𝑬|
Notation
Experiment results: Scalability
• MoSSo processed each change in near-constant time
Experiment results: Scalability
EU (Insertion-only) SK (Fully-dynamic)
• MoSSo processed each change in near-constant time
Outline
• Preliminaries
• Proposed Algorithm: MoSSo
• Experimental Results
• Conclusions
Conclusions
We propose MoSSo, the first algorithm for incremental lossless graph summarization
Conclusions
Fast and ‘any time’
We propose MoSSo, the first algorithm for incremental lossless graph summarization
Conclusions
Fast and ‘any time’ Effective
We propose MoSSo, the first algorithm for incremental lossless graph summarization
Conclusions
Fast and ‘any time’ Effective Scalable
We propose MoSSo, the first algorithm for incremental lossless graph summarization
Conclusions
Fast and ‘any time’ Effective Scalable
The code and datasets used in the paper
are available at http://dmlab.kaist.ac.kr/mosso/
We propose MoSSo, the first algorithm for incremental lossless graph summarization
Incremental Lossless
Graph Summarization
Jihoon Ko* Yunbum Kook* Kijung Shin

Contenu connexe

Tendances

Case Study on Succuessful Journey of Kelloogg's Corn Flakes
 Case Study on Succuessful Journey of Kelloogg's Corn Flakes Case Study on Succuessful Journey of Kelloogg's Corn Flakes
Case Study on Succuessful Journey of Kelloogg's Corn FlakesVARUN KESAVAN
 
Aula 2. frameworks js
Aula 2. frameworks jsAula 2. frameworks js
Aula 2. frameworks jsandreluizlc
 
Branding Plan Proposal PowerPoint Presentation Slides
Branding Plan Proposal PowerPoint Presentation SlidesBranding Plan Proposal PowerPoint Presentation Slides
Branding Plan Proposal PowerPoint Presentation SlidesSlideTeam
 
mod1-algoritmia
mod1-algoritmiamod1-algoritmia
mod1-algoritmiadiogoa21
 
Creating a MIssion Statement
Creating a MIssion StatementCreating a MIssion Statement
Creating a MIssion StatementLifelong Learning
 
Nova Identidade Visual Oi
Nova Identidade Visual OiNova Identidade Visual Oi
Nova Identidade Visual OiFillipe Luis
 
PRESENTATION ON MOUNTAIN DEW sheni
PRESENTATION ON MOUNTAIN DEW sheniPRESENTATION ON MOUNTAIN DEW sheni
PRESENTATION ON MOUNTAIN DEW shenishenila ladhani
 
SWOT ANALYSIS OF PEPSI
SWOT ANALYSIS OF PEPSISWOT ANALYSIS OF PEPSI
SWOT ANALYSIS OF PEPSIRupraj Saha
 
Kellogs presentation
Kellogs presentationKellogs presentation
Kellogs presentationmz7093
 
Branding Services Proposal PowerPoint Presentation Slides
Branding Services Proposal PowerPoint Presentation SlidesBranding Services Proposal PowerPoint Presentation Slides
Branding Services Proposal PowerPoint Presentation SlidesSlideTeam
 
Arquitetura de Marcas da BR Foods - Sadia e Perdigão
Arquitetura de Marcas da BR Foods - Sadia e PerdigãoArquitetura de Marcas da BR Foods - Sadia e Perdigão
Arquitetura de Marcas da BR Foods - Sadia e PerdigãoLeticia Ikeda
 

Tendances (13)

Guia básico wix
Guia básico wixGuia básico wix
Guia básico wix
 
Case Study on Succuessful Journey of Kelloogg's Corn Flakes
 Case Study on Succuessful Journey of Kelloogg's Corn Flakes Case Study on Succuessful Journey of Kelloogg's Corn Flakes
Case Study on Succuessful Journey of Kelloogg's Corn Flakes
 
Marca fiat
Marca fiatMarca fiat
Marca fiat
 
Aula 2. frameworks js
Aula 2. frameworks jsAula 2. frameworks js
Aula 2. frameworks js
 
Branding Plan Proposal PowerPoint Presentation Slides
Branding Plan Proposal PowerPoint Presentation SlidesBranding Plan Proposal PowerPoint Presentation Slides
Branding Plan Proposal PowerPoint Presentation Slides
 
mod1-algoritmia
mod1-algoritmiamod1-algoritmia
mod1-algoritmia
 
Creating a MIssion Statement
Creating a MIssion StatementCreating a MIssion Statement
Creating a MIssion Statement
 
Nova Identidade Visual Oi
Nova Identidade Visual OiNova Identidade Visual Oi
Nova Identidade Visual Oi
 
PRESENTATION ON MOUNTAIN DEW sheni
PRESENTATION ON MOUNTAIN DEW sheniPRESENTATION ON MOUNTAIN DEW sheni
PRESENTATION ON MOUNTAIN DEW sheni
 
SWOT ANALYSIS OF PEPSI
SWOT ANALYSIS OF PEPSISWOT ANALYSIS OF PEPSI
SWOT ANALYSIS OF PEPSI
 
Kellogs presentation
Kellogs presentationKellogs presentation
Kellogs presentation
 
Branding Services Proposal PowerPoint Presentation Slides
Branding Services Proposal PowerPoint Presentation SlidesBranding Services Proposal PowerPoint Presentation Slides
Branding Services Proposal PowerPoint Presentation Slides
 
Arquitetura de Marcas da BR Foods - Sadia e Perdigão
Arquitetura de Marcas da BR Foods - Sadia e PerdigãoArquitetura de Marcas da BR Foods - Sadia e Perdigão
Arquitetura de Marcas da BR Foods - Sadia e Perdigão
 

Similaire à "Incremental Lossless Graph Summarization", KDD 2020

20180831 riemannian representation learning
20180831 riemannian representation learning20180831 riemannian representation learning
20180831 riemannian representation learningsegwangkim
 
Ch 5-integration-part-1
Ch 5-integration-part-1Ch 5-integration-part-1
Ch 5-integration-part-1GpmMaths
 
Direct solution of sparse network equations by optimally ordered triangular f...
Direct solution of sparse network equations by optimally ordered triangular f...Direct solution of sparse network equations by optimally ordered triangular f...
Direct solution of sparse network equations by optimally ordered triangular f...Dimas Ruliandi
 
Applied Algorithms and Structures week999
Applied Algorithms and Structures week999Applied Algorithms and Structures week999
Applied Algorithms and Structures week999fashiontrendzz20
 
GraphBLAS: A linear algebraic approach for high-performance graph queries
GraphBLAS: A linear algebraic approach for high-performance graph queriesGraphBLAS: A linear algebraic approach for high-performance graph queries
GraphBLAS: A linear algebraic approach for high-performance graph queriesGábor Szárnyas
 
Linear regression, costs & gradient descent
Linear regression, costs & gradient descentLinear regression, costs & gradient descent
Linear regression, costs & gradient descentRevanth Kumar
 
Pre-calculus 1, 2 and Calculus I (exam notes)
Pre-calculus 1, 2 and Calculus I (exam notes)Pre-calculus 1, 2 and Calculus I (exam notes)
Pre-calculus 1, 2 and Calculus I (exam notes)William Faber
 
Finding Top-k Similar Graphs in Graph Database @ ReadingCircle
Finding Top-k Similar Graphs in Graph Database @ ReadingCircleFinding Top-k Similar Graphs in Graph Database @ ReadingCircle
Finding Top-k Similar Graphs in Graph Database @ ReadingCirclecharlingual
 
GAN in_kakao
GAN in_kakaoGAN in_kakao
GAN in_kakaoJunho Kim
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
 
Complex differentiation contains analytic function.pptx
Complex differentiation contains analytic function.pptxComplex differentiation contains analytic function.pptx
Complex differentiation contains analytic function.pptxjyotidighole2
 
Unit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptxUnit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptxssuser01e301
 
Method of characteristic for bell nozzle design
Method of characteristic for bell nozzle designMethod of characteristic for bell nozzle design
Method of characteristic for bell nozzle designMahdi H. Gholi Nejad
 
Max flows via electrical flows (long talk)
Max flows via electrical flows (long talk)Max flows via electrical flows (long talk)
Max flows via electrical flows (long talk)Thatchaphol Saranurak
 
Differential Geometry for Machine Learning
Differential Geometry for Machine LearningDifferential Geometry for Machine Learning
Differential Geometry for Machine LearningSEMINARGROOT
 
IJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsIJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsAkisato Kimura
 
Chapter 1 - What is a Function.pdf
Chapter 1 - What is a Function.pdfChapter 1 - What is a Function.pdf
Chapter 1 - What is a Function.pdfManarKareem1
 

Similaire à "Incremental Lossless Graph Summarization", KDD 2020 (20)

20180831 riemannian representation learning
20180831 riemannian representation learning20180831 riemannian representation learning
20180831 riemannian representation learning
 
Ch 5-integration-part-1
Ch 5-integration-part-1Ch 5-integration-part-1
Ch 5-integration-part-1
 
Direct solution of sparse network equations by optimally ordered triangular f...
Direct solution of sparse network equations by optimally ordered triangular f...Direct solution of sparse network equations by optimally ordered triangular f...
Direct solution of sparse network equations by optimally ordered triangular f...
 
Applied Algorithms and Structures week999
Applied Algorithms and Structures week999Applied Algorithms and Structures week999
Applied Algorithms and Structures week999
 
GraphBLAS: A linear algebraic approach for high-performance graph queries
GraphBLAS: A linear algebraic approach for high-performance graph queriesGraphBLAS: A linear algebraic approach for high-performance graph queries
GraphBLAS: A linear algebraic approach for high-performance graph queries
 
Linear regression, costs & gradient descent
Linear regression, costs & gradient descentLinear regression, costs & gradient descent
Linear regression, costs & gradient descent
 
Pre-calculus 1, 2 and Calculus I (exam notes)
Pre-calculus 1, 2 and Calculus I (exam notes)Pre-calculus 1, 2 and Calculus I (exam notes)
Pre-calculus 1, 2 and Calculus I (exam notes)
 
Finding Top-k Similar Graphs in Graph Database @ ReadingCircle
Finding Top-k Similar Graphs in Graph Database @ ReadingCircleFinding Top-k Similar Graphs in Graph Database @ ReadingCircle
Finding Top-k Similar Graphs in Graph Database @ ReadingCircle
 
GAN in_kakao
GAN in_kakaoGAN in_kakao
GAN in_kakao
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
Basic calculus (ii) recap
Basic calculus (ii) recapBasic calculus (ii) recap
Basic calculus (ii) recap
 
Complex differentiation contains analytic function.pptx
Complex differentiation contains analytic function.pptxComplex differentiation contains analytic function.pptx
Complex differentiation contains analytic function.pptx
 
GraphTransformations.pptx
GraphTransformations.pptxGraphTransformations.pptx
GraphTransformations.pptx
 
Unit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptxUnit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptx
 
Method of characteristic for bell nozzle design
Method of characteristic for bell nozzle designMethod of characteristic for bell nozzle design
Method of characteristic for bell nozzle design
 
Max flows via electrical flows (long talk)
Max flows via electrical flows (long talk)Max flows via electrical flows (long talk)
Max flows via electrical flows (long talk)
 
Differential Geometry for Machine Learning
Differential Geometry for Machine LearningDifferential Geometry for Machine Learning
Differential Geometry for Machine Learning
 
Differentiation
Differentiation Differentiation
Differentiation
 
IJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsIJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphs
 
Chapter 1 - What is a Function.pdf
Chapter 1 - What is a Function.pdfChapter 1 - What is a Function.pdf
Chapter 1 - What is a Function.pdf
 

Dernier

Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制vexqp
 

Dernier (20)

Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 

"Incremental Lossless Graph Summarization", KDD 2020

  • 1. Incremental Lossless Graph Summarization Jihoon Ko* Yunbum Kook* Kijung Shin
  • 2. Large-scale Graphs are Everywhere! Icon made by Freepik from www.flaticon.com 2B+ active users 600M+ users 1.5B+ users
  • 3. Large-scale Graphs are Everywhere! (cont.) 4B+ web pages 5M papers 6K+ proteins Icon made by Freepik from www.flaticon.com
  • 4. Graph Compression for Efficient Manipulation • Handling large-scale graphs as they are...  heavy disk or network I/O
  • 5. Graph Compression for Efficient Manipulation • Handling large-scale graphs as they are...  heavy disk or network I/O • Their compact representation makes possible efficient manipulation!
  • 6. Graph Compression for Efficient Manipulation • Handling large-scale graphs as they are...  heavy disk or network I/O • Their compact representation makes possible efficient manipulation! • A larger portion of original graphs can be stored in main memory or cache
  • 7. Previous Graph Compression Techniques • Various compression techniques have been proposed • Relabeling nodes • Pattern mining • Lossless graph summarization  One of the most effective compression techniques • …
  • 8. Previous Graph Compression Techniques • Various compression techniques have been proposed • Relabeling nodes • Pattern mining • Lossless graph summarization  One of the most effective compression techniques • …
  • 9. Previous Graph Compression Techniques • Various compression techniques have been proposed • Relabeling nodes • Pattern mining • Lossless graph summarization  One of the most effective compression techniques • … • Lossless graph summarization is a batch algorithm for “static graphs”, which indicate a single or a few snapshots of evolving graphs
  • 10. Previous Graph Compression Techniques • Various compression techniques have been proposed • Relabeling nodes • Pattern mining • Lossless graph summarization  One of the most effective compression techniques • … • Lossless graph summarization is a batch algorithm for “static graphs”, which indicate a single or a few snapshots of evolving graphs However, most real-world graphs go through lots of changes in fact...
  • 11. Real-world Graphs are Evolving 2B+ users2M+ users 10 years
  • 12. Real-world Graphs are Evolving 2B+ users2M+ users 10 years Previous algorithms: not designed to allow for changes in graphs Algorithms should be rerun from scratch to reflect changes
  • 13. Real-world Graphs are Evolving 2B+ users2M+ users 10 years Previous algorithms: not designed to allow for changes in graphs Algorithms should be rerun from scratch to reflect changes Solution: Incrementally update compressed graphs in fast and effective manners!
  • 14. Outline • Preliminaries • Proposed Algorithm: MoSSo • Experimental Results • Conclusions
  • 15. Lossless Graph Summarization: Example 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 Input graph with 𝟏𝟏𝟏𝟏 edges
  • 16. Lossless Graph Summarization: Example 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 Input graph with 𝟏𝟏𝟏𝟏 edges 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖
  • 17. Lossless Graph Summarization: Example 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 Input graph with 𝟏𝟏𝟏𝟏 edges 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} Delete {𝑓𝑓, 𝑖𝑖}
  • 18. Lossless Graph Summarization: Example 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 Input graph with 𝟏𝟏𝟏𝟏 edges 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} Add {𝑎𝑎, 𝑓𝑓} & Delete {𝑓𝑓, 𝑖𝑖} Delete {𝑓𝑓, 𝑖𝑖}
  • 19. Lossless Graph Summarization: Example Add {𝑎𝑎, 𝑓𝑓} & Delete {𝑓𝑓, 𝑖𝑖} 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 Output with 𝟒𝟒 edges Input graph with 𝟏𝟏𝟏𝟏 edges 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} Add {𝑎𝑎, 𝑓𝑓} & Delete {𝑓𝑓, 𝑖𝑖} Delete {𝑓𝑓, 𝑖𝑖} 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 20. Lossless Graph Summarization: Definition Lossless Summarization Lossless summarization yields (1) a summary graph and (2) edge corrections, while minimizing the edge count 𝑷𝑷 + 𝑪𝑪+ + |𝑪𝑪− | (≈ “description cost” denoted by 𝝋𝝋) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖
  • 21. Lossless Graph Summarization: Definition Lossless Summarization Lossless summarization yields (1) a summary graph and (2) edge corrections, while minimizing the edge count 𝑷𝑷 + 𝑪𝑪+ + |𝑪𝑪− | (≈ “description cost” denoted by 𝝋𝝋) Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 22. Lossless Graph Summarization: Definition Lossless Summarization Lossless summarization yields (1) a summary graph and (2) edge corrections, while minimizing the edge count 𝑷𝑷 + 𝑪𝑪+ + |𝑪𝑪− | (≈ “description cost” denoted by 𝝋𝝋) 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 23. Lossless Graph Summarization: Definition Lossless Summarization Lossless summarization yields (1) a summary graph and (2) edge corrections, while minimizing the edge count 𝑷𝑷 + 𝑪𝑪+ + |𝑪𝑪− | (≈ “description cost” denoted by 𝝋𝝋) Proposed in [NRS08] based on “the Minimum Description Length principle” 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 24. Lossless Graph Summarization: Definition Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 25. Lossless Graph Summarization: Definition 1. Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) • Supernodes 𝑺𝑺 = a partition of 𝑽𝑽, where each supernode is a set of nodes • Superedges 𝑷𝑷 = a set of pairs of supernodes (ex: {𝑨𝑨, 𝑩𝑩} in example above) Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 26. Lossless Graph Summarization: Definition 1. Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) • Supernodes 𝑺𝑺 = a partition of 𝑽𝑽, where each supernode is a set of nodes • Superedges 𝑷𝑷 = a set of pairs of supernodes (ex: {𝑨𝑨, 𝑩𝑩} in example above) Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} Supernode
  • 27. Lossless Graph Summarization: Definition 1. Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) • Supernodes 𝑺𝑺 = a partition of 𝑽𝑽, where each supernode is a set of nodes • Superedges 𝑷𝑷 = a set of pairs of supernodes (ex: {𝑨𝑨, 𝑩𝑩} in example above) Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} Supernode Superedge
  • 28. Lossless Graph Summarization: Definition 1. Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) • Supernodes 𝑺𝑺 = a partition of 𝑽𝑽, where each supernode is a set of nodes • Superedges 𝑷𝑷 = a set of pairs of supernodes (ex: {𝑨𝑨, 𝑩𝑩} in example above) 2. Edge corrections (𝑪𝑪+, 𝑪𝑪−) • Residual graph (Positive) 𝑪𝑪+ • Residual graph (Negative) 𝑪𝑪− Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} Supernode Superedge
  • 29. Lossless Graph Summarization: Notation Supernode containing 𝒖𝒖 Edges between supernodes 𝑨𝑨 and 𝑩𝑩 All possible edges between 𝑨𝑨 and 𝑩𝑩 Neighborhood of a node 𝒖𝒖 Nodes incident to 𝒖𝒖 in 𝑪𝑪+ (or 𝑪𝑪−) Compression rate : 𝐒𝐒𝒖𝒖 (i.e. 𝒖𝒖 ∈ 𝑺𝑺𝒖𝒖) : 𝑬𝑬𝑨𝑨𝑨𝑨 = {𝒖𝒖𝒖𝒖 ∈ 𝑬𝑬 ∶ 𝒖𝒖 ∈ 𝑨𝑨, 𝒗𝒗 ∈ 𝑩𝑩 (𝒖𝒖 ≠ 𝒗𝒗)} : 𝑻𝑻𝑨𝑨𝑨𝑨 = {𝒖𝒖𝒖𝒖 ⊆ 𝑽𝑽: 𝒖𝒖 ∈ 𝑨𝑨, 𝒗𝒗 ∈ 𝑩𝑩 (𝒖𝒖 ≠ 𝒗𝒗)} : 𝑵𝑵 𝒖𝒖 = {𝒗𝒗 ∈ 𝑽𝑽 ∶ 𝒖𝒖𝒖𝒖 ∈ 𝑬𝑬} : 𝑪𝑪+(𝒖𝒖) (or 𝑪𝑪−(𝒖𝒖)) : ( 𝑷𝑷 + 𝑪𝑪+ + 𝑪𝑪− )/|𝑬𝑬| Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 30. Lossless Graph Summarization: Optimal Encoding For summarization, determining supernodes 𝑺𝑺 (a partition of 𝑽𝑽) is our main concern  For given 𝑺𝑺, superedges 𝑷𝑷 and edge corrections 𝑪𝑪 are optimally determined Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 31. Lossless Graph Summarization: Optimal Encoding Edges 𝑬𝑬𝑨𝑨𝑨𝑨 between two supernodes: (1) a superedge with 𝑪𝑪− or (2) no superedge with 𝑪𝑪+ Case 1: 𝑬𝑬𝑨𝑨𝑨𝑨 ≥ 𝑻𝑻𝑨𝑨𝑨𝑨 +𝟏𝟏 𝟐𝟐 : add superedge 𝑨𝑨𝑨𝑨 to 𝑷𝑷 and 𝑻𝑻𝑨𝑨𝑨𝑨𝑬𝑬𝑨𝑨𝑨𝑨 to 𝑪𝑪− Case 2: 𝑬𝑬𝑨𝑨𝑨𝑨 < 𝑻𝑻𝑨𝑨𝑨𝑨 +𝟏𝟏 𝟐𝟐 : add all edges in 𝑬𝑬𝑨𝑨𝑨𝑨 to 𝑪𝑪+ Costs: |𝐄𝐄𝐀𝐀𝐀𝐀|Costs: 𝟏𝟏 + 𝐓𝐓𝑨𝑨𝑨𝑨 − |𝐄𝐄𝐀𝐀𝐀𝐀| Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} 𝑬𝑬𝑨𝑨𝑨𝑨: Edges between supernodes 𝑨𝑨 and 𝑩𝑩 𝑻𝑻𝑨𝑨𝑨𝑨: All possible edges between 𝑨𝑨 and 𝑩𝑩 Notation
  • 32. Lossless Graph Summarization: Optimal Encoding Superedge 𝑨𝑨𝑨𝑨 𝝋𝝋 = 𝟐𝟐 + 𝟏𝟏 + 𝟏𝟏 = 𝟒𝟒 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 33. Lossless Graph Summarization: Optimal Encoding Superedge 𝑨𝑨𝑨𝑨 𝑪𝑪+ only 𝝋𝝋 = 𝟐𝟐 + 𝟏𝟏 + 𝟏𝟏 = 𝟒𝟒 𝝋𝝋 = 𝟏𝟏 + 𝟓𝟓 + 𝟏𝟏 = 𝟕𝟕 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} 𝐶𝐶+ = 𝑎𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 34. Recovery: Example 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 35. Recovery: Example 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} Add all pairs of nodes between two adjacent supernodes 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} 𝑒𝑒 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖
  • 36. Recovery: Example 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} Add all pairs of nodes between two adjacent supernodes Remove all edges in 𝐂𝐂− 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} 𝑒𝑒 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝑒𝑒 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖
  • 37. Recovery: Example 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} Add all pairs of nodes between two adjacent supernodes Remove all edges in 𝐂𝐂− Add all edges in 𝐂𝐂+ 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} 𝑒𝑒 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝑒𝑒 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖
  • 38. Why Lossless Graph Summarization? 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 39. Why Lossless Graph Summarization? • Queryable (Retrieving the neighborhood of a query node) 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 40. Why Lossless Graph Summarization? • Queryable (Retrieving the neighborhood of a query node) • Queryability: key building blocks in numerous graph algorithms (ex: DFS, PageRank, Dijkstra’s, etc) • Rapidly done from a summary and corrections 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 41. Why Lossless Graph Summarization? • Queryable (Retrieving the neighborhood of a query node) • Queryability: key building blocks in numerous graph algorithms (ex: DFS, PageRank, Dijkstra’s, etc) • Rapidly done from a summary and corrections • Combinable • Its outputs are also graphs  further compressed via other compression techniques! 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 42. Fully Dynamic Graph Stream Fully dynamic graphs can be represented by using a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −  The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕
  • 43. Fully Dynamic Graph Stream Fully dynamic graphs can be represented by using a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −  The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕 Stream of changes:
  • 44. Fully Dynamic Graph Stream Fully dynamic graphs can be represented by using a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −  The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕 Stream of changes: Empty graph 𝑮𝑮𝟎𝟎 Time 𝒕𝒕 = 𝟎𝟎
  • 45. Fully Dynamic Graph Stream Fully dynamic graphs can be represented by using a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −  The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕 ……Stream of changes: + - - + + Empty graph 𝑮𝑮𝟎𝟎 Time 𝒕𝒕 = 𝟎𝟎
  • 46. Fully Dynamic Graph Stream Fully dynamic graphs can be represented by using a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −  The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕 ……Stream of changes: + - - + + Empty graph 𝑮𝑮𝟎𝟎 Time 𝒕𝒕 = 𝟎𝟎 Current graph 𝑮𝑮𝒕𝒕 Time 𝒕𝒕
  • 47. Fully Dynamic Graph Stream Fully dynamic graphs can be represented by using a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −  The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕 ……Stream of changes: + - - + -+ …… Empty graph 𝑮𝑮𝟎𝟎 Time 𝒕𝒕 = 𝟎𝟎 Current graph 𝑮𝑮𝒕𝒕 Time 𝒕𝒕
  • 48. Problem Formulation • Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ • Retain Summary graph 𝑮𝑮𝒕𝒕 ∗ = 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕 + , 𝑪𝑪𝒕𝒕 − ) of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕 • To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕 + + 𝑪𝑪𝒕𝒕 −
  • 49. Problem Formulation • Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ • Retain Summary graph 𝑮𝑮𝒕𝒕 ∗ = 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕 + , 𝑪𝑪𝒕𝒕 − ) of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕 • To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕 + + 𝑪𝑪𝒕𝒕 − Retained at time 𝒕𝒕 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 50. Problem Formulation • Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ • Retain Summary graph 𝑮𝑮𝒕𝒕 ∗ = 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕 + , 𝑪𝑪𝒕𝒕 − ) of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕 • To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕 + + 𝑪𝑪𝒕𝒕 − + Retained at time 𝒕𝒕 Edge change 𝒆𝒆𝒕𝒕+𝟏𝟏 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 51. Problem Formulation • Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ • Retain Summary graph 𝑮𝑮𝒕𝒕 ∗ = 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕 + , 𝑪𝑪𝒕𝒕 − ) of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕 • To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕 + + 𝑪𝑪𝒕𝒕 − + Retained at time 𝒕𝒕 Edge change 𝒆𝒆𝒕𝒕+𝟏𝟏 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 52. Problem Formulation • Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ • Retain Summary graph 𝑮𝑮𝒕𝒕 ∗ = 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕 + , 𝑪𝑪𝒕𝒕 − ) of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕 • To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕 + + 𝑪𝑪𝒕𝒕 − + -+ …… Retained at time 𝒕𝒕 Edge change 𝒆𝒆𝒕𝒕+𝟏𝟏 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 53. Challenge: Fast Update but Good Performance
  • 54. Outline • Preliminaries • Proposed Algorithm: MoSSo • Experimental Results • Conclusions
  • 55. Scheme for Incremental Summarization Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 4 𝐶𝐶+ 𝐶𝐶− 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖
  • 56. Scheme for Incremental Summarization Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ New edge: 𝑗𝑗𝑏𝑏 𝑎𝑎 𝑗𝑗 𝑗𝑗 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  • 57. Scheme for Incremental Summarization Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ New edge: 𝑗𝑗𝑏𝑏 How to update current summarization? 𝑎𝑎 𝑗𝑗 𝑗𝑗 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  • 58. Scheme for Incremental Summarization Our approach (1) Attempt to move nodes among supernodes (2) Accept the move if 𝝋𝝋 decreases (3) Reject otherwise Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑗𝑗𝑏𝑏 𝑗𝑗 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  • 59. Scheme for Incremental Summarization Our approach (1) Attempt to move nodes among supernodes (2) Accept the move if 𝝋𝝋 decreases (3) Reject otherwise Testing node Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑗𝑗𝑏𝑏 𝑗𝑗 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  • 60. Scheme for Incremental Summarization Our approach (1) Attempt to move nodes among supernodes (2) Accept the move if 𝝋𝝋 decreases (3) Reject otherwise Testing node Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑗𝑗𝑏𝑏 𝑗𝑗 Candidate 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  • 61. Scheme for Incremental Summarization Our approach (1) Attempt to move nodes among supernodes (2) Accept the move if 𝝋𝝋 decreases (3) Reject otherwise Testing node Testing Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑗𝑗𝑏𝑏 𝑗𝑗 Candidate 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  • 62. Scheme for Incremental Summarization MoSSo finds... (1) Testing nodes whose move likely results in 𝛗𝛗 ↓ (2) Candidates for testing node, likely resulting in 𝛗𝛗 ↓ Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑗𝑗𝑏𝑏 𝑗𝑗 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  • 63. Lossless summarization Scheme for Incremental Summarization Current graph 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝑗𝑗𝑏𝑏 𝑗𝑗Testing node Candidate 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  • 64. Scheme for Incremental Summarization Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑗𝑗𝑏𝑏 𝑗𝑗 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 4 𝐶𝐶+ 𝐶𝐶−𝐶𝐶+ 𝐶𝐶− 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖
  • 65. MoSSo: Main Ideas • Step 1: Set testing nodes • (S1) No restoration from the current summarization 𝐺𝐺𝑡𝑡 ∗ = 𝑆𝑆𝑡𝑡, 𝑃𝑃𝑡𝑡 , 𝐶𝐶𝑡𝑡 = (𝐶𝐶𝑡𝑡 + , 𝐶𝐶𝑡𝑡 − ) • (S2) Reduce redundant testing by a stochastic filtering 𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} • Step 2: Find candidate • (S3) Utilize an incremental coarse clustering • (S4) Inject flexibility to reorganization of supernodes Which nodes to move? (testing nodes) 𝑢𝑢 𝑣𝑣 Testing node Move into which supernode? (candidates)
  • 66. MoSSo: Main Ideas • Step 1: Set testing nodes • (S1) No restoration from the current summarization 𝐺𝐺𝑡𝑡 ∗ = 𝑆𝑆𝑡𝑡, 𝑃𝑃𝑡𝑡 , 𝐶𝐶𝑡𝑡 = (𝐶𝐶𝑡𝑡 + , 𝐶𝐶𝑡𝑡 − ) • (S2) Reduce redundant testing by a stochastic filtering Repeated Time 𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} • Step 2: Find candidate • (S3) Utilize an incremental coarse clustering • (S4) Inject flexibility to reorganization of supernodes Performance Which nodes to move? (testing nodes) 𝑢𝑢 𝑣𝑣 Testing node Move into which supernode? (candidates)
  • 67. MoSSo: Details Parameters: • Sample number 𝒄𝒄 • Escape prob. 𝒆𝒆 Input: • Summary graph 𝑮𝑮𝒕𝒕 ∗ & Edge corrections 𝑪𝑪𝒕𝒕 • Edge change 𝒖𝒖, 𝒗𝒗 + (addition) or 𝒖𝒖, 𝒗𝒗 − (deletion) Output: • Summary graph 𝑮𝑮𝒕𝒕+𝟏𝟏 ∗ & Edge corrections 𝑪𝑪𝒕𝒕+𝟏𝟏
  • 68. MoSSo: Details (Step 1) – MCMC 𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} 𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖 Notation
  • 69. MoSSo: Details (Step 1) – MCMC Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected 𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} 𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖 Notation
  • 70. MoSSo: Details (Step 1) – MCMC Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected  Focus on testings nodes in 𝑵𝑵(𝒖𝒖)𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} 𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖 Notation
  • 71. MoSSo: Details (Step 1) – MCMC Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected  Focus on testings nodes in 𝑵𝑵(𝒖𝒖) P1. To sample neighbors, one should retrieve all 𝑵𝑵(𝒖𝒖) from 𝑮𝑮∗ and 𝑪𝑪, which takes 𝑶𝑶(𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) time on average 𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} 𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖 Notation
  • 72. MoSSo: Details (Step 1) – MCMC Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected  Focus on testings nodes in 𝑵𝑵(𝒖𝒖) P1. To sample neighbors, one should retrieve all 𝑵𝑵(𝒖𝒖) from 𝑮𝑮∗ and 𝑪𝑪, which takes 𝑶𝑶(𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) time on average  Deadly to scalability… 𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} 𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖 Notation
  • 73. MoSSo: Details (Step 1) – MCMC Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected  Focus on testings nodes in 𝑵𝑵(𝒖𝒖) P1. To sample neighbors, one should retrieve all 𝑵𝑵(𝒖𝒖) from 𝑮𝑮∗ and 𝑪𝑪, which takes 𝑶𝑶(𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) time on average  Deadly to scalability… 𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} Graph densification law [LKF05]: “The average degree of real-world graphs increases over time.” 𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖 Notation
  • 74. MoSSo: Details (Step 1) – MCMC (cont.) S1. Without full retrievals of 𝑵𝑵(𝒖𝒖), sample 𝒄𝒄 neighbors in un iformly random by using Markov Chain Monte Carlo method (MCMC)  MCMC method: sampling from a random variable with its probability density proportional to a given function 𝑢𝑢 𝑣𝑣
  • 75. MoSSo: Details (Step 1) – Probabilistic Filtering Test all the sampled nodes? 𝑣𝑣 𝑢𝑢
  • 76. MoSSo: Details (Step 1) – Probabilistic Filtering Test all the sampled nodes?  Better not… 𝑣𝑣 𝑢𝑢
  • 77. MoSSo: Details (Step 1) – Probabilistic Filtering Test all the sampled nodes? P2. Too frequent testing on high-degree nodes as 𝑃𝑃(𝑣𝑣 sampled) ∝ 𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣)  Better not… 𝑣𝑣 𝑢𝑢
  • 78. MoSSo: Details (Step 1) – Probabilistic Filtering Test all the sampled nodes? P2. Too frequent testing on high-degree nodes as 𝑃𝑃(𝑣𝑣 sampled) ∝ 𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣) Computationally heavy (Too many nbrs)  Better not… - Updating the optimal encoding - Computing the change 𝛥𝛥𝛥𝛥 in the description cost 𝑣𝑣 𝑢𝑢
  • 79. MoSSo: Details (Step 1) – Probabilistic Filtering Test all the sampled nodes? P2. Too frequent testing on high-degree nodes as 𝑃𝑃(𝑣𝑣 sampled) ∝ 𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣) Computationally heavy (Too many nbrs) S2. Test a sampled node 𝑣𝑣 w.p. 1 𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣) (1) Likely to avoid expensive testing on high-degree nodes (2) In expectation, 𝑃𝑃(𝑣𝑣: actually tested) is the same across all nodes 𝑣𝑣 (i.e., smoothen unbalance in # of testing)  Better not… - Updating the optimal encoding - Computing the change 𝛥𝛥𝛥𝛥 in the description cost 𝑣𝑣 𝑢𝑢
  • 80. MoSSo: Details (Step 2) – Coarse Clustering Testing node 𝑦𝑦 𝑣𝑣 𝑢𝑢
  • 81. MoSSo: Details (Step 2) – Coarse Clustering P3. Among many choices, how do we know ”good” candidates? Testing node 𝑦𝑦 (likely resulting in 𝝋𝝋 ↓) 𝑣𝑣 𝑢𝑢
  • 82. MoSSo: Details (Step 2) – Coarse Clustering P3. Among many choices, how do we know ”good” candidates? Testing node 𝑦𝑦 (likely resulting in 𝝋𝝋 ↓) S3. Utilize an incremental coarse clustering  Desirable: Nodes with “similar connectivity” in the same cluster  Any incremental coarse clustering with the desirable property! 𝑣𝑣 𝑢𝑢
  • 83. MoSSo: Details (Step 2) – Coarse Clustering P3. Among many choices, how do we know ”good” candidates? Testing node 𝑦𝑦 (likely resulting in 𝝋𝝋 ↓) S3. Utilize an incremental coarse clustering  Desirable: Nodes with “similar connectivity” in the same cluster  Any incremental coarse clustering with the desirable property! (1) Fast with the desirable theoretical property: 𝑷𝑷 𝒖𝒖, 𝒗𝒗 ∈ 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄 ∝ 𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱(𝑵𝑵 𝒖𝒖 , 𝑵𝑵 𝒗𝒗 ) ⇒ Grouping nodes with similar connectivity Min-hashing 𝑣𝑣 𝑢𝑢
  • 84. MoSSo: Details (Step 2) – Coarse Clustering P3. Among many choices, how do we know ”good” candidates? Testing node 𝑦𝑦 (likely resulting in 𝝋𝝋 ↓) S3. Utilize an incremental coarse clustering  Desirable: Nodes with “similar connectivity” in the same cluster  Any incremental coarse clustering with the desirable property! (1) Fast with the desirable theoretical property: (2) Clusters from min-hashing: updated rapidly in response to edge changes 𝑷𝑷 𝒖𝒖, 𝒗𝒗 ∈ 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄 ∝ 𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱(𝑵𝑵 𝒖𝒖 , 𝑵𝑵 𝒗𝒗 ) ⇒ Grouping nodes with similar connectivity Min-hashing 𝑣𝑣 𝑢𝑢
  • 85. MoSSo: Details (Step 2) – Separation of Node 𝑦𝑦 𝑣𝑣 Testing node 𝑢𝑢
  • 86. MoSSo: Details (Step 2) – Separation of Node P4. In this way, moving nodes only decreases or maintains 𝑺𝑺  Discourage reorganizing supernodes in the long run 𝑦𝑦 𝑣𝑣 Testing node 𝑢𝑢
  • 87. MoSSo: Details (Step 2) – Separation of Node P4. In this way, moving nodes only decreases or maintains 𝑺𝑺  Discourage reorganizing supernodes in the long run 𝑦𝑦 S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆 𝑣𝑣 Testing node 𝑢𝑢
  • 88. MoSSo: Details (Step 2) – Separation of Node P4. In this way, moving nodes only decreases or maintains 𝑺𝑺  Discourage reorganizing supernodes in the long run 𝑦𝑦 S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆 𝑣𝑣 Testing node 𝑢𝑢
  • 89. MoSSo: Details (Step 2) – Separation of Node P4. In this way, moving nodes only decreases or maintains 𝑺𝑺  Discourage reorganizing supernodes in the long run 𝑦𝑦 S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆  Inject flexibility to supernodes (a partition of 𝑽𝑽)  Empirically significant improvement in compression rates 𝑣𝑣 Testing node 𝑢𝑢
  • 90. MoSSo: Details (Step 2) – Separation of Node P4. In this way, moving nodes only decreases or maintains 𝑺𝑺  Discourage reorganizing supernodes in the long run 𝑦𝑦 S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆  Inject flexibility to supernodes (a partition of 𝑽𝑽)  Empirically significant improvement in compression rates Similar to before, accept or reject the separation depending on Δ𝝋𝝋 𝑣𝑣 Testing node 𝑢𝑢
  • 91. Outline • Preliminaries • Proposed Algorithm: MoSSo • Experimental Results • Conclusions
  • 92. Experimental Settings • 10 Real-world Graphs (up to 0.3B edges)
  • 93. Experimental Settings • 10 Real-world Graphs (up to 0.3B edges) Web
  • 94. Experimental Settings • 10 Real-world Graphs (up to 0.3B edges) Web Social
  • 95. Experimental Settings • 10 Real-world Graphs (up to 0.3B edges) Web Social Collaboration
  • 96. Experimental Settings • 10 Real-world Graphs (up to 0.3B edges) Web Social Collaboration Email And others!
  • 97. Experimental Settings • 10 Real-world Graphs (up to 0.3B edges) • Batch loseless graph summarization algorithms: • Randomized [NSR08], SAGS [KNL15], SWeG [SGKR19] Web Social Collaboration Email And others!
  • 98. Baseline Incremental Algorithms • MoSSo-Greedy: • Greedily moves nodes related to inserted/deleted edge, while fixing the other nodes so that the objective is minimized • MoSSo-MCMC • See the paper for details • MoSSo-Simple • MoSSo without coarse clustering
  • 99. Experiment results: Speed • MoSSo processed each change up to 7 orders of magnitude faster than running the fastest batch algorithm
  • 100. Experiment results: Speed • MoSSo processed each change up to 7 orders of magnitude faster than running the fastest batch algorithm UK (Insertion-only)
  • 101. Experiment results: Speed • MoSSo processed each change up to 7 orders of magnitude faster than running the fastest batch algorithm Insertion-only graph streams Fully-dynamic graph streams UK (Insertion-only)
  • 102. Experiment results: Compression Performance • The compression ratio of MoSSo was even comparable to those of the best batch algorithms • MoSSo achieved the best compression ratios among the streaming algorithms Compression ratio: ( 𝑷𝑷 + 𝑪𝑪+ + 𝑪𝑪− )/|𝑬𝑬| Notation
  • 103. Experiment results: Compression Performance • The compression ratio of MoSSo was even comparable to those of the best batch algorithms • MoSSo achieved the best compression ratios among the streaming algorithms UK Compression ratio: ( 𝑷𝑷 + 𝑪𝑪+ + 𝑪𝑪− )/|𝑬𝑬| Notation
  • 104. Experiment results: Compression Performance • The compression ratio of MoSSo was even comparable to those of the best batch algorithms • MoSSo achieved the best compression ratios among the streaming algorithms PR EN FB DB YT SK LJ EU HW UK Compression ratio: ( 𝑷𝑷 + 𝑪𝑪+ + 𝑪𝑪− )/|𝑬𝑬| Notation
  • 105. Experiment results: Scalability • MoSSo processed each change in near-constant time
  • 106. Experiment results: Scalability EU (Insertion-only) SK (Fully-dynamic) • MoSSo processed each change in near-constant time
  • 107. Outline • Preliminaries • Proposed Algorithm: MoSSo • Experimental Results • Conclusions
  • 108. Conclusions We propose MoSSo, the first algorithm for incremental lossless graph summarization
  • 109. Conclusions Fast and ‘any time’ We propose MoSSo, the first algorithm for incremental lossless graph summarization
  • 110. Conclusions Fast and ‘any time’ Effective We propose MoSSo, the first algorithm for incremental lossless graph summarization
  • 111. Conclusions Fast and ‘any time’ Effective Scalable We propose MoSSo, the first algorithm for incremental lossless graph summarization
  • 112. Conclusions Fast and ‘any time’ Effective Scalable The code and datasets used in the paper are available at http://dmlab.kaist.ac.kr/mosso/ We propose MoSSo, the first algorithm for incremental lossless graph summarization
  • 113. Incremental Lossless Graph Summarization Jihoon Ko* Yunbum Kook* Kijung Shin