The 7 Things I Know About Cyber Security After 25 Years | April 2024
semlavssws2015
1. Parallel Data Loading during
Querying Deep Web and Linked
Open Data with SPARQL
Pauline Folz 12, Gabriela Montoya 13, Hala Skaf-Molli 1, Pascal Molli
1 and Maria-Esther Vidal 4
1 LINA-- Nantes University, France
2 Nantes Métropole - Direction Recherche, Innovation et Enseignement
Supérieur,
3 Centre National de la Recherche Scientifique (CNRS), France
4 Universidad Simon Bolivar, Venezuela
1
SSWS2015@ISWC2015
2. Querying Linked Open Data with
SPARQL
• Who in the Semantic Web Community
knows a well known person?
SELECT DISTINCT *
WHERE {
?P foaf:member ?C .
?C rdfs:label ``Semantic Web’’ .
?P foaf:knows ?WKP .
?WKP foaf:name ?N.
FILTER(?N=``Barack Obama”)
}
2
No
Results
L
LOD Data sources
3. • Who in the Semantic Web Community
knows a well known person?
SELECT DISTINCT *
WHERE {
?P foaf:member ?C .
?C rdfs:label ``Semantic Web’’ .
?P foaf:knows ?WKP .
?WKP foaf:name ?N .
FILTER(?N=``Barack Obama”)
} LOD Data sources
Querying Deep Web and Linked
Open Data with SPARQL
3
Deep Web Data sources
Results
J !
4. 4
P. Folz, G. Montoya, H. Skaf-Molli, P. Molli, and M. Vidal. Semlav: Querying deep web and linked
open data with SPARQL. Demo ESWC 2014, Revised Selected Papers, pages 332–337, 2014.
Video available at: https://www.youtube.com/watch?v=z7w31f-ybuQ
5. SemLAV: Local-As-View Mediation
for SPARQL
5
G. Montoya, L. D. Ibánez, H. Skaf-Molli, P. Molli, and M.-E. Vidal. SemLAV: Local-As-View
Mediation for SPARQL. Transactions on Large-Scale Data- and Knowledge-Centered Systems,
LNCS, Vol. 8420, pages 33–58, 2014.
Q(P,C,WKP,N):- member(P,C), label(C,”Semantic Web”),
knows(P,WKP), name(WKP,”Barack Obama”)
v1(P,A,I,C,L) :- made(P,A),affiliation(P,I),member(P,C),label(C,L)
v2(A,T,P,N,C) :- title(A,T),made(P,A),name(P,N),member(P,C)
v3(P,N,R,M) :- name(P,N),name(R,M),knows(P,R)
v4(P,N,G,R,C) :-name(P,N),gender(P,G),knows(P,R),member(P,C)
v5(P,N,R,C,L) :-name(P,N),knows(P,R),member(P,C),label(C,L)
Query :
LAV mappings:
6. Compute Buckets
6
G. Montoya, L. D. Ibánez, H. Skaf-Molli, P. Molli, and M.-E. Vidal. SemLAV: Local-As-View Mediation for
SPARQL. Transactions on Large-Scale Data- and Knowledge-Centered Systems, LNCS, Vol. 8420, pages
33–58, 2014.
Q(P,C,WKP,N):- member(P,C), label(C,”Semantic Web”), knows(P,WKP),
name(WKP,”Barack Obama”)
v1(P,A,I,C,L):-made(P,A),affiliation(P,I),member(P,C),label(C,L)
v2(A,T,P,N,C):-title(A,T),made(P,A),name(P,N),member(P,C)
v3(P,N,R,M):-name(P,N),name(R,M),knows(P,R)
v4(P,N,G,R,C):-name(P,N),gender(P,G),knows(P,R),member(P,C)
v5(P,N,R,C,L):-name(P,N),knows(P,R),member(P,C),label(C,L)
Query :
LAV mappings:
member(P,C) label(C,L) knows(P,WKP) name(WKP,N)
v1(P,A,I,C,L) v1(P,A,I,C,L) v3(P,N,R,M) v2(A,T,P,N,C)
v2(A,T,P,N,C) v5(P,N,R,C,L) v4(P,N,G,R,C) v3(P,N,R,M)
v4(P,N,G,R,C) v5(P,N,R,C,L) v4(P,N,G,R,C)
v5(P,N,R,C,L) v5(P,N,R,C,L)
7. Bottleneck
of
LAV
approach
• A LAV mediator relies on a query rewriter to translate
a mediator query into the union of queries against the
views.
• The number of candidate rewritings in the worst case
is: (M×|V|)N. N the number of query sub-goals, M the
maximal number of views sub-goals, and V the set of
views,
– For the simple query example -> 96 candidate rewritings
– For a more complex query -> millions of rewritings
• Problems:
– Cannot execute all rewritings
– Cannot guess which rewritings could produce results
7
8. SemLAV Approach
• Do
not
generate
rewritings
• Materialize
relevant
views
and
execute
original
query
– Problem:
maybe
no
time,
or
no
space
to
materialize
all
views
• Materialization
order
matters:
– Need
to
decide
which
views
to
materialize
views
– We
decide
according
to
the
number
of
“covered
rewritings”
8
9. Ranking Relevant Views
9
G. Montoya, L. D. Ibánez, H. Skaf-Molli, P. Molli, and M.-E. Vidal. SemLAV: Local-As-View Mediation for
SPARQL. Transactions on Large-Scale Data- and Knowledge-Centered Systems, LNCS, Vol. 8420, pages
33–58, 2014.
Q(P,C,WKP,N):- member(P,C), label(C,”Semantic Web”), knows(P,WKP),
name(WKP,,”Barack Obama”)
v1(P,A,I,C,L):-made(P,A),affiliation(P,I),member(P,C),label(C,L)
v2(A,T,P,N,C):-title(A,T),made(P,A),name(P,N),member(P,C)
v3(P,N,R,M):-name(P,N),name(R,M),knows(P,R)
v4(P,N,G,R,C):-name(P,N),gender(P,G),knows(P,R),member(P,C)
v5(P,N,R,C,L):-name(P,N),knows(P,R),member(P,C),label(C,L)
Query :
LAV mappings:
member(P,C) label(C,L) knows(P,WKP) name(WKP,N)
v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L)
v4(P,N,G,R,C) v1(P,A,I,C,L) v4(P,N,G,R,C) v4(P,N,G,R,C)
v1(P,A,I,C,L) v3(P,N,R,M) v2(A,T,P,N,C)
v2(A,T,P,N,C) v3(P,N,R,M)
4
3
2
2
12. So
SemLAV Works
J
12
Number of Answers produced by SemLAV and randomly selected views during two
minutes.
13. Drawbacks of SemLAV
• Blocking execution
strategy:
– Views are contacted one by
one in order.
– If v5 is huge..
• Impact performance of
SemLAV:
– Throughput
– Time of first answer
– Total Time
13
v1
v5
v4
v2
v3
14. View Loading and Query Execution
Sequential loading Parallel loading
14
v5
v4
v1
v2
v3
v5
v1
v2
v4
v3
A pool of 3 threads to download in
parallel.
When v1 is loaded and the query is
executed
- Expect more answers, sooner ??
- But, the number of triples is growing
much faster than in sequential
15. View Loading and Query Execution
Sequential loading Parallel loading
15
V5
V4
v1
v2
v3
V5
v1
V2
v4
v3
Loading data in parallel requires to :
• Manage concurrent insertions into
the integrated RDF graph
16. Concurrency Management
• Parallel insertions into a grow only
graph is a lock-free problem.
• However, existing RDF stores are
designed for
insert/delete/transaction.
• Hence, RDF stores poorly support
parallel materialization of views
(need for a dedicated RDF store).
16
17. parallel SemLAV (PS):
Concurrency Model
– We simulated on the top of JENA a Single-
Reader/Multiple-Writers strategy (SRMW).
– Each view is divided into n blocks of 100
triples.
17
v5
v1
v2
v4
v3
A bock of 100 triples
• Could we have better
performances just with that ?
18. When to execute the query?
• Why waiting until a view is loaded to execute the
query ? Others simple strategies are possible?
Which one is the best?
• Be careful :
– more query execution -> less loading
– less query execution -> more time for first results
• We define four execution strategies.
– View dependent (PS), Time dependent (PS-TDC),
Data dependent (PS-DDC), Two-phase execution
(DDC-ASK), (TDC-ASK)
18
19. View Dependent Criterion (PS)
• The query engine is woken up
after a new view is completely
loaded.
19
v5
v1
v2
v4
v3
20. Time Dependent Criterion (PS-TDC)
• The query engine is woken up after a
period of time t
– if t is n milliseconds, execute query every n
milliseconds
20
v5
V1
v2
V4
v3
0
n
4n
2n
3n
time
21. Data Dependent Criterion (PS-DDC)
• The query engine is woken up after a
certain number n of triples are inserted
into the integrated RDF graph by the
writers.
21
v5
V1
v2
V4
V3
0
n
4n
2n
3n
Data
size
22. Two-phases Criterion (PS-DDC-
ASK) and (PS-TDC-ASK)
• First phase performs an ASK query to
check for new results: if yes, 2nd phase.
• Second phase executes the original query
– (PS-TDC-ASK) or (PS-DDC-ASK) .
22
v5
v1 v2v4 v3
ASK
-‐>NO
ASK
-‐>NO
ASK
-‐>
Yes
23. Experimentations Evaluation
• Implement and compare with SemLAV:
– Berlin Benchmark1: 10,000,736 triples
– 16 queries (out of 18), 510 views
– Linux
server
with
128
GB
of
memory,
124
processors,
20
GB
of
RAM
are
allocated
for
the
experiments.
• For parallel SemLAV (PS)
– Threads are executed in parallel to download views
– Different number of threads: 5, 10 and 20 threads
– More information in the paper and project website:
https://sites.goole.com/site/sematiclav
23
32. Conclusion and Future Work
• Parallel processing of SPARQL queries using LAV
Views.
• New execution strategies outperforms SemLAV in
terms of throughput and total Time.
• Trade-off between throughput and time for first
answer.
• In the future:
– Build a grow only RDF store to better support parallel
loading
– Incremental evaluation of the query relying on view
update…
32