Incremental and parallel computation of structural graph summaries for evolving graphs
1. Incremental and Parallel Computation of
Structural Graph Summaries
for Evolving Graphs
Till Blume1
, David Richerby2
, and Ansgar Scherp3
CIKM 2020, Virtual Event
1
Kiel University, Germany
2
University of Essex, United Kingdom
3
Ulm University, Germany
2. Structural Graph Summaries
Structural graph summaries are a condensed representation of graphs such that a
set of chosen (structural) features in the graph summary are equivalent to the
original graph.
Structural Features (f1
,..., fx
)
Input Graph
Structural Graph Summary
2
4. Problem Definition
● there are various different structural features that can be used to summarize
● when the input graph changes, it is often prohibitively expensive to recompute
the structural graph summary from scratch
● existing incremental algorithms are often not designed for evolving graphs or
require an explicit change log
4
5. Contribution
1. generic, parallel algorithm to incrementally compute and update structural
graph summaries and as well as a generic data structure following our formal
language
2. theoretical complexity analysis: all graph summaries defined in the formal
language can be updated in O(∆·dk
), with ∆ changes the input graph, d is the
maximum degree of the input graph, and k is the maximum distance in the
subgraphs considered for the equivalence
3. empirical analyses on benchmark and real-world datasets: our
incremental algorithm outperforms a batch computation even with about 50%
of the graph changed
5
8. Experimental Evaluation
Datasets
● LUBM100 (~2.1 M vertices and ~13 M edges)
● BSBM (up to 1.3 M vertices and 13 M edges)
● DyLDO-core (2.1–3.5 M vertices and 7–13 M edges)
● DyLDO-ext (7–10 M vertices and 84–106 M edges)
Summary Models
● Attribute Collection
● Type Collection
● SchemEX
In total, 312 experiments for incremental and for batch each
8
12. Conclusion
1. generic, parallel algorithm to incrementally compute and update structural graph
summaries and as well as a generic data structure following our formal language
2. theoretical complexity analysis: all graph summaries defined in the formal
language can be updated in O(∆·dk
), with ∆ changes the input graph, d is the
maximum degree of the input graph, and k is the maximum distance in the
subgraphs considered for the equivalence
3. empirical analyses on benchmark and real-world datasets: our incremental
algorithm outperforms a batch computation even with about 50% of the graph
changed
Source Code and all resources available on GitHub:
https://github.com/t-blume/fluid-spark 12