Approximate and Incremental Processing of Complex Queries against the Web of Data
1. Approximate and Incremental Processing of
Complex Queries against the Web of Data
Thanh Tran, Günter Ladwig, Andreas Wagner
DEXA 2011
Institute of Applied Informatics and Formal Description Methods (AIFB)
KIT – University of the State of Baden-Württemberg and
National Large-scale Research Center of the Helmholtz Association www.kit.edu
2. Contents
Approximate
Introduction Overview & Incremental Evaluation Conclusion
Processing
Structure-based
Approximate
Result
Entity Search Structure
Refinement and
Matching
Computation
2 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
3. INTRODUCTION
3 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
4. Introduction – Data Model
Resource Description Framework (RDF)
conference
a1 c1
authorOf
super- authorOf
vises
name p2 p1 p5
P2 P5
worksAt name
worksAt
knows
i1 u1
partOf
p4 p3
super- name
vises worksAt
authorOf U1
a2 i2
conference partOf
c2 u2
4 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
5. Introduction – Query Model
Basic Graph Patterns
Conjunctive queries over RDF data: graph pattern matching
AIFB name KIT
partOf name
z u
worksAt
supervise
w x y v name
age author conf
ICDE
29
5 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
6. Contribution
Techniques for matching (basic) query patterns against graph-
structured data have limits
We might wish to trade completeness and exactness for
responsiveness
Our approach allows an “affordable” computation of an initial set
of approximate results, which can be incrementally refined as
needed.
6 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
7. Contribution – Pipeline Overview
Pipeline of operations where approximate results are refined
incrementally
Intermediate,
Approximate Results
Approximate Structure- Structure-
Entity Search Structure based Result based Answer
Matching Refinement Computation
Entity &
Structure
Neighborhood Relation Index
Index
Index
7 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
8. Approximate Structure- Structure-
Entity Search Structure based Result based Answer
Matching Refinement Computation
ENTITY SEARCH
8 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
9. Entity Search
Entity index
Stores attribute edges of the data graph
Enables lookup of entities by attribute and value
Entity search
Obtains candidate bindings for all variables in the query that have
attribute edges
Does not consider structure (i.e., relations between entities)
Query decomposition and transformation
Decompose query into entity queries to create a transformed
query
9 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
10. Query Decomposition & Transformation
AIFB name KIT
partOf name
z u
worksAt
supervise
w x y v
age author conf name
ICDE
29
Identify entity queries
Breadth-first search starting from random variable
10 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
11. Query Decomposition & Transformation
AIFB name KIT
partOf name
z u
worksAt
supervise
w x y v
age author conf name
ICDE
29
Collapse entity queries
z partOf u
name AIFB name KIT
worksAt
w
supervise x y v
age 29 author conf name ICDE
11 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
12. Entity Search Results
Use entity index to obtain bindings for all entity queries in
transformed query
Entity queries are necessary conditions, x z u v
but not sufficient p1 i1 u1 c1
Final results will be a subset p3 i1 u1 c1
p5 i1 u1 c1
p6 i1 u1 c1
z partOf u
name AIFB name KIT
worksAt
w
supervise x y v
age 29 author conf name ICDE
12 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
13. Approximate Structure- Structure-
Entity Search Structure based Result based Answer
Matching Refinement Computation
APPROXIMATE STRUCTURE
MATCHING
13 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
14. Approximate Structure Matching
Only entity parts of the query have been matched
Relation edges have yet to be processed
Instead of performing exact equijoins we propose to perform a
neighborhood join
The k-neighborhood of an entity e is the set of entities in the data graph
that can be reached from e via a path of relation edges of length k or less.
Neighborhood join allows us to check whether two entities are
connected via relation edges (but not which ones)
A neighborhood join between two sets of entities E1, E2 is an equijoin
between all pairs e1 ∈ E1, e2 ∈ E2 where e1 and e2 are considered
equivalent if the intersection of their k-neighborhood is non-empty.
Again: necessary, but not sufficient
14 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
15. Neighborhood Join via Bloom Filters
We store the set of k-neighborhood entities as a bloom filter
Bloom filter
Space-efficient, probabilistic data structure for set membership test
False positives are possible (false negatives are not)
We refine the results of the previous step
To perform a neighborhood join between bindings E1, E2
Load bloom filters for one set of entities, say E1
In a nested loop manner, check if entities in E2 are contained in the
bloom filter
15 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
16. Neighborhood Join via Bloom Filters
AIFB
name
KIT
partOf name
z u
worksAt
supervise
w x y v
age author conf name
ICDE
29
k=1
k=2
Load bloom filters for entities bound to x
Check whether entities bound to w,y, z are in the neighborhood
of x
When k=2, bloom filters for x also cover u and v
16 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
17. Approximate Structure- Structure-
Entity Search Structure based Result based Answer
Matching Refinement Computation
STRUCTURE-BASED RESULT
REFINEMENT
17 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
18. Structure-based Result Refinement
From ASM we know that entities in intermediate results are
connected
Necessary, but not sufficient.
With structure-based result refinement we find out whether they
are connected via paths captured by query atoms
Query is matched against a structure index graph
Bisimulation-based summary of data graph that captures structural
information
Nodes in the data graph with the same “structure” are grouped
together
Much smaller than the data graph
18 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
19. Structure Index Bisimulation
conference
a1 c1
authorOf
super- authorOf
vises
p2 p1 p5
worksAt
worksAt
knows worksAt partOf
E6 E3 E5
i1 u1 p5 i1,i2 u1, u2
partOf
p4 p3
super-
vises worksAt
authorOf worksAt
authorOf
a2 i2
E1 E2 E4 E6
p2,p4 super- p1,p3 authorOf a1,a2 conference c1,c2
conference partOf vises
c2 u2 knows
Structure Index Graph G~
Data graph G
19 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
20. Structure-based Result Refinement
We take advantage of this property:
Whenever there is a match of a query graph q on G the query also
matches on G~. Moreover, extensions of the index graph
matches will contain all data graph matches, i.e. the bindings to
query variables.
Match the query against the structure index graph to obtain sets
of extensions that contain potential query answers
Bindings computed in previous ES/ASM steps can only be
answers if they are contained in the matched extensions
20 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
21. Approximate Structure- Structure-
Entity Search Structure based Result based Answer
Matching Refinement Computation
STRUCTURE-BASED ANSWER
COMPUTATION
21 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
22. Structure-based Answer Compution
Finally, results which exactly match the query are computed by
the last refinement.
Only for this step, we actually perform joins on the data.
22 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
23. EVALUTION
23 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
24. Evaluation
Systems
INC: the proposed approach
VP: join processing using vertical partitioning with sextuple indexing
Datasets
DBLP: 13M triples
LUBM: 0.7M – 6.7M triples
Queries
Generated 80 queries via random sampling
Different shapes: path, star, graph
24 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
25. Results – Average Processing Time
25 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
26. Results – Average Processing Time
Neighborhood Distance
26 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
27. Results – Precision vs. Time
27 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
28. Results - Precision
28 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
29. Conclusion
We proposed a novel process for approximate and
incremental processing of complex graph pattern queries
Initial results are computed in a small fraction of total time and
the incrementally refined via approximate matching at low cost
Increased responsiveness as inexact results are available early
Users can decide if and for which result exactness and
completeness is desirable
Experiments show that our approach is relatively fast w.r.t. exact
and complete results, indicating that the proposed mechanism is
able to reuse intermediate results
29 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
30. 30 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
31. BACKUP SLIDES
31 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)