SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
Parallel  Data  Loading  during  
Querying  Deep  Web  and  Linked  
Open  Data  with  SPARQL
Pauline  Folz 12,  Gabriela  Montoya  13,  Hala Skaf-­Molli 1,  Pascal  Molli
1   and  Maria-­Esther  Vidal  4
1 LINA-­-­ Nantes  University,  France
2 Nantes  Métropole -­ Direction  Recherche,  Innovation  et  Enseignement
Supérieur,  
3 Centre  National  de  la  Recherche Scientifique (CNRS),  France
4 Universidad  Simon  Bolivar,  Venezuela  
1
SSWS2015@ISWC2015
Querying  Linked  Open  Data  with  
SPARQL  
• Who  in  the  Semantic  Web  Community  
knows  a  well  known  person?
SELECT  DISTINCT  *
WHERE  {
?P  foaf:member ?C  .
?C  rdfs:label ``Semantic  Web’’  .
?P  foaf:knows ?WKP  .
?WKP  foaf:name ?N.
FILTER(?N=``Barack  Obama”)
}
2
No	
  
Results	
  
L
LOD  Data  sources
• Who  in  the  Semantic  Web  Community  
knows  a  well  known  person?
SELECT  DISTINCT  *
WHERE  {
?P  foaf:member ?C  .
?C  rdfs:label ``Semantic  Web’’  .
?P  foaf:knows ?WKP  .
?WKP  foaf:name ?N  .
FILTER(?N=``Barack  Obama”)
} LOD  Data  sources
Querying  Deep  Web  and  Linked  
Open  Data  with  SPARQL  
3
Deep  Web    Data  sources
Results	
  
J !
4
P.  Folz,  G.  Montoya,  H.  Skaf-­Molli,  P.  Molli,  and  M.  Vidal.  Semlav:  Querying  deep  web  and  linked  
open  data  with  SPARQL.  Demo  ESWC  2014,  Revised  Selected  Papers,  pages  332–337,  2014.  
Video  available  at:  https://www.youtube.com/watch?v=z7w31f-­ybuQ
SemLAV:  Local-­As-­View  Mediation  
for  SPARQL
5
G.  Montoya,  L.  D.  Ibánez,  H.  Skaf-­Molli,  P.  Molli,  and  M.-­E.  Vidal.  SemLAV:  Local-­As-­View  
Mediation  for  SPARQL.  Transactions  on  Large-­Scale  Data-­ and  Knowledge-­Centered   Systems,    
LNCS,  Vol.  8420,  pages  33–58,  2014.  
Q(P,C,WKP,N):-­ member(P,C),  label(C,”Semantic Web”),  
knows(P,WKP),  name(WKP,”Barack Obama”)
v1(P,A,I,C,L)  :-­ made(P,A),affiliation(P,I),member(P,C),label(C,L)
v2(A,T,P,N,C)  :-­ title(A,T),made(P,A),name(P,N),member(P,C)
v3(P,N,R,M)  :-­ name(P,N),name(R,M),knows(P,R)
v4(P,N,G,R,C)  :-­name(P,N),gender(P,G),knows(P,R),member(P,C)
v5(P,N,R,C,L)  :-­name(P,N),knows(P,R),member(P,C),label(C,L)
Query :
LAV  mappings:
Compute  Buckets
6
G.  Montoya,  L.  D.  Ibánez,  H.  Skaf-­Molli,  P.  Molli,  and  M.-­E.  Vidal.  SemLAV:  Local-­As-­View  Mediation  for  
SPARQL.  Transactions  on  Large-­Scale  Data-­ and  Knowledge-­Centered  Systems,    LNCS,  Vol.  8420,  pages  
33–58,  2014.  
Q(P,C,WKP,N):-­ member(P,C),  label(C,”Semantic Web”),  knows(P,WKP),  
name(WKP,”Barack Obama”)
v1(P,A,I,C,L):-­made(P,A),affiliation(P,I),member(P,C),label(C,L)
v2(A,T,P,N,C):-­title(A,T),made(P,A),name(P,N),member(P,C)
v3(P,N,R,M):-­name(P,N),name(R,M),knows(P,R)
v4(P,N,G,R,C):-­name(P,N),gender(P,G),knows(P,R),member(P,C)
v5(P,N,R,C,L):-­name(P,N),knows(P,R),member(P,C),label(C,L)
Query :
LAV  mappings:
member(P,C) label(C,L) knows(P,WKP) name(WKP,N)
v1(P,A,I,C,L) v1(P,A,I,C,L) v3(P,N,R,M) v2(A,T,P,N,C)
v2(A,T,P,N,C) v5(P,N,R,C,L) v4(P,N,G,R,C) v3(P,N,R,M)
v4(P,N,G,R,C) v5(P,N,R,C,L) v4(P,N,G,R,C)
v5(P,N,R,C,L) v5(P,N,R,C,L)
Bottleneck	
  of	
  LAV	
  approach
• A  LAV  mediator  relies  on  a  query  rewriter  to  translate  
a  mediator  query  into  the  union  of  queries  against  the  
views.
• The  number  of  candidate   rewritings  in  the  worst  case  
is:  (M×|V|)N.  N  the  number  of  query  sub-­goals,  M  the  
maximal  number  of  views  sub-­goals,  and  V  the  set  of  
views,  
– For  the  simple  query  example  -­>  96  candidate  rewritings
– For  a  more  complex  query  -­>  millions  of  rewritings
• Problems:  
– Cannot  execute  all  rewritings  
– Cannot  guess  which  rewritings  could  produce  results
7
SemLAV Approach
• Do	
  not	
  generate	
  rewritings
• Materialize	
  relevant	
  views	
  and	
  execute	
  
original	
  query
– Problem:	
  maybe	
  no	
  time,	
  or	
  no	
  space	
  to	
  
materialize	
  all	
  views
• Materialization	
  order	
  matters:
– Need	
  to	
  decide	
  which	
  views	
  to	
  materialize	
  views
– We	
  decide	
  according	
  to	
  the	
  number	
  of	
  “covered	
  
rewritings”
8
Ranking  Relevant  Views
9
G.  Montoya,  L.  D.  Ibánez,  H.  Skaf-­Molli,  P.  Molli,  and  M.-­E.  Vidal.  SemLAV:  Local-­As-­View  Mediation  for  
SPARQL.  Transactions  on  Large-­Scale  Data-­ and  Knowledge-­Centered  Systems,    LNCS,  Vol.  8420,  pages  
33–58,  2014.  
Q(P,C,WKP,N):-­ member(P,C),  label(C,”Semantic Web”),  knows(P,WKP),  
name(WKP,,”Barack  Obama”)
v1(P,A,I,C,L):-­made(P,A),affiliation(P,I),member(P,C),label(C,L)
v2(A,T,P,N,C):-­title(A,T),made(P,A),name(P,N),member(P,C)
v3(P,N,R,M):-­name(P,N),name(R,M),knows(P,R)
v4(P,N,G,R,C):-­name(P,N),gender(P,G),knows(P,R),member(P,C)
v5(P,N,R,C,L):-­name(P,N),knows(P,R),member(P,C),label(C,L)
Query :
LAV  mappings:
member(P,C) label(C,L) knows(P,WKP) name(WKP,N)
v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L)
v4(P,N,G,R,C) v1(P,A,I,C,L) v4(P,N,G,R,C) v4(P,N,G,R,C)
v1(P,A,I,C,L) v3(P,N,R,M) v2(A,T,P,N,C)
v2(A,T,P,N,C) v3(P,N,R,M)
4
3
2
2
Materialization	
  Order	
  Matters
10
#  Included
views (k)
SemLAV ranking Random  order
Included views
(Vk)
#  Covered  
rewritings
Included views
(Vk)
#  Covered  
rewritings
1
2
3
4
5
v5
v5,  v4
v5,  v4,  v1
v5,  v4,  v1,  v3
v5,  v4,  v1,  v3,  v2
1×1×1×1=1
2×1×2×2=8
3  × 2  × 2  × 2  =  24
3  × 2  × 3  × 3  =  54
4  × 2  × 3  × 4  =  96
v1
v1,  v2
v1,  v2,  v3
v1,  v2,  v3,  v4
v1,  v2,  v3,  v4,  v5
1×1×0×0=0
2×1×0×1=0
2×1×1×2=4
3  × 1  × 2  × 3  =  18
4  × 2  × 3  × 4  =  96
member(P,C) label(C,L) knows(P,WKP) name(WKP,N)
v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L)
v4(P,N,G,R,C) v1(P,A,I,C,L) v4(P,N,G,R,C) v4(P,N,G,R,C)
v1(P,A,I,C,L) v3(P,N,R,M) v2(A,T,P,N,C)
v2(A,T,P,N,C) v3(P,N,R,M)
4
3
2
2
Query  processing  over  materialized  views
11
v4
v1
v5
member(P,C) label(C,L) knows(P,WKP) name(WKP,N)
v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L)
v4(P,N,G,R,C) v1(P,A,I,C,L) v4(P,N,G,R,C) v4(P,N,G,R,C)
v1(P,A,I,C,L) v3(P,N,R,M) v2(A,T,P,N,C)
v2(A,T,P,N,C) v3(P,N,R,M)
v2
v3
So	
  SemLAV Works	
  J
12
Number  of  Answers  produced  by  SemLAV and  randomly  selected  views  during  two  
minutes.
Drawbacks  of  SemLAV
• Blocking  execution  
strategy:  
– Views  are  contacted  one  by  
one  in  order.
– If  v5  is  huge..
• Impact  performance  of  
SemLAV:
– Throughput
– Time  of  first  answer
– Total  Time
13
v1
v5
v4
v2
v3
View  Loading  and  Query  Execution
Sequential  loading Parallel  loading
14
v5
v4
v1
v2
v3
v5
v1
v2
v4
v3
A  pool  of  3  threads  to  download  in  
parallel.
When  v1  is  loaded  and  the  query  is  
executed
-­ Expect  more  answers,  sooner  ??
-­ But,  the  number  of  triples  is  growing  
much  faster  than  in  sequential  
View  Loading  and  Query  Execution
Sequential  loading Parallel  loading
15
V5
V4
v1
v2
v3
V5
v1
V2
v4
v3
Loading  data  in  parallel  requires  to  :
• Manage  concurrent  insertions  into  
the  integrated  RDF  graph  
Concurrency  Management
• Parallel  insertions  into  a  grow  only  
graph  is  a  lock-­free  problem.  
• However,  existing  RDF  stores  are  
designed  for  
insert/delete/transaction.
• Hence,  RDF  stores  poorly  support  
parallel  materialization  of  views  
(need  for  a  dedicated  RDF  store).
16
parallel  SemLAV (PS):  
Concurrency  Model
– We  simulated  on  the  top  of  JENA  a  Single-­
Reader/Multiple-­Writers  strategy  (SRMW).
– Each  view  is  divided  into  n blocks  of  100  
triples.
17
v5
v1
v2
v4
v3
A  bock  of  100  triples
• Could  we  have  better  
performances  just  with  that  ?
When  to  execute  the  query?
• Why  waiting    until  a  view  is  loaded  to  execute  the  
query  ?  Others  simple  strategies  are  possible?  
Which  one  is  the  best?
• Be  careful  :
– more  query  execution  -­>  less  loading
– less  query  execution  -­>  more  time  for  first  results
• We  define  four  execution  strategies.
– View  dependent  (PS),  Time  dependent  (PS-­TDC),  
Data  dependent  (PS-­DDC),  Two-­phase  execution  
(DDC-­ASK),  (TDC-­ASK)  
18
View  Dependent  Criterion  (PS)
• The  query  engine  is  woken  up  
after  a  new  view  is  completely  
loaded.
19
v5
v1
v2
v4
v3
Time  Dependent  Criterion  (PS-­TDC)
• The  query  engine  is  woken  up  after  a  
period  of  time  t
– if  t is  n milliseconds,  execute  query  every  n
milliseconds  
20
v5
V1
v2
V4
v3
0
n
4n
2n
3n
time
Data  Dependent  Criterion  (PS-­DDC)
• The  query  engine  is  woken  up  after  a  
certain  number  n  of  triples  are  inserted  
into  the  integrated  RDF  graph  by  the  
writers.
21
v5
V1
v2
V4
V3
0
n
4n
2n
3n
Data	
  
size
Two-­phases  Criterion  (PS-­DDC-­
ASK)  and  (PS-­TDC-­ASK)
• First  phase  performs  an  ASK    query  to  
check  for  new  results:  if  yes,  2nd phase.  
• Second  phase  executes  the  original  query
– (PS-­TDC-­ASK)  or  (PS-­DDC-­ASK)  .
22
v5
v1 v2v4 v3
ASK	
  -­‐>NO
ASK	
  -­‐>NO
ASK	
  -­‐>	
  Yes
Experimentations  Evaluation
• Implement  and  compare  with  SemLAV:
– Berlin  Benchmark1:  10,000,736  triples
– 16  queries  (out  of  18),  510  views
– Linux	
  server	
  with	
  128	
  GB	
  of	
  memory,	
  124	
  
processors,	
  20	
  GB	
  of	
  RAM	
  are	
  allocated	
  for	
  the	
  
experiments.
• For  parallel  SemLAV (PS)
– Threads  are  executed  in  parallel  to  download  views
– Different  number  of  threads:  5,  10  and  20  threads
– More  information  in  the  paper  and  project  website:  
https://sites.goole.com/site/sematiclav
23
Results  of  BSBM  View  Dependent  Criterion  (PS)
24
Results  of  BSBM  using  Time-­Dependent  Criterion  
(PS-­TDC).  Queries  are  executed  every  500  msecs
25
Results  of  BSBM   using  Data  Dependent  Criterion  
(PS-­DDC).  Queries  are  executed  every  the  
insertion  of  500  triples
26
27
Results  of  BSBM    using   PS-­DDC-­ASK  strategy,  queries  
are  executed  whenever  500  triples  inserted  in  the  
integrated  RDF  graph
SO	
  WHAT	
  ??
28
29
Better  Total  Time for  parallel  SemLAV
But  no  dominate  strategy  
Better  throughput for  parallel  SemLAV
But  no  dominate  strategy
31
Time  for  First  answer
Conclusion  and  Future  Work
• Parallel  processing  of  SPARQL  queries  using  LAV  
Views.
• New  execution  strategies    outperforms  SemLAV in  
terms  of  throughput  and  total  Time.  
• Trade-­off  between  throughput  and  time  for  first  
answer.
• In  the  future:
– Build  a  grow  only  RDF  store  to  better  support  parallel  
loading
– Incremental  evaluation  of  the  query  relying  on  view  
update…
32
33

Contenu connexe

Tendances

Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02
eswcsummerschool
 

Tendances (20)

A middleware for storing massive RDF graphs into NoSQL
A middleware for storing massive RDF graphs into NoSQLA middleware for storing massive RDF graphs into NoSQL
A middleware for storing massive RDF graphs into NoSQL
 
Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.
 
Workload-Aware RDF Partitioning and SPARQL Query Caching for Massive RDF Gra...
Workload-Aware RDF Partitioning  and SPARQL Query Caching for Massive RDF Gra...Workload-Aware RDF Partitioning  and SPARQL Query Caching for Massive RDF Gra...
Workload-Aware RDF Partitioning and SPARQL Query Caching for Massive RDF Gra...
 
PigSPARQL - Mapping SPARQL to Pig Latin
PigSPARQL - Mapping SPARQL to Pig LatinPigSPARQL - Mapping SPARQL to Pig Latin
PigSPARQL - Mapping SPARQL to Pig Latin
 
a look at the postgresql engine
a look at the postgresql enginea look at the postgresql engine
a look at the postgresql engine
 
PostgreSql query planning and tuning
PostgreSql query planning and tuningPostgreSql query planning and tuning
PostgreSql query planning and tuning
 
Python for Data Analysis
Python for Data AnalysisPython for Data Analysis
Python for Data Analysis
 
NdFluents: An Ontology for Annotated Statements with Inference Preservation
NdFluents: An Ontology for Annotated Statements with Inference PreservationNdFluents: An Ontology for Annotated Statements with Inference Preservation
NdFluents: An Ontology for Annotated Statements with Inference Preservation
 
LDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked DataLDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked Data
 
NoSQL and Triple Stores
NoSQL and Triple StoresNoSQL and Triple Stores
NoSQL and Triple Stores
 
Lineage-driven Fault Injection, SIGMOD'15
Lineage-driven Fault Injection, SIGMOD'15Lineage-driven Fault Injection, SIGMOD'15
Lineage-driven Fault Injection, SIGMOD'15
 
A Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the WebA Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the Web
 
Statistical Linked Data
Statistical Linked DataStatistical Linked Data
Statistical Linked Data
 
Python Fundamentals
Python FundamentalsPython Fundamentals
Python Fundamentals
 
2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge
 
Crafting tailored wordlists with Wordsmith
Crafting tailored wordlists with WordsmithCrafting tailored wordlists with Wordsmith
Crafting tailored wordlists with Wordsmith
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02
 
Introduction to R for Data Science :: Session 1
Introduction to R for Data Science :: Session 1Introduction to R for Data Science :: Session 1
Introduction to R for Data Science :: Session 1
 
The ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in TranswerwiseThe ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in Transwerwise
 
The world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmithThe world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmith
 

Similaire à semlavssws2015

From SMW to Rules
From SMW to RulesFrom SMW to Rules
From SMW to Rules
Jie Bao
 
Maria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data streamMaria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data stream
PyData
 
Steffen Staab's Presentation at SSSW 2011
Steffen Staab's Presentation at SSSW 2011Steffen Staab's Presentation at SSSW 2011
Steffen Staab's Presentation at SSSW 2011
sssw2011
 

Similaire à semlavssws2015 (20)

Linked Data Fragments
Linked Data FragmentsLinked Data Fragments
Linked Data Fragments
 
LD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and toolsLD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and tools
 
Enabling Biobank-Scale Genomic Processing with Spark SQL
Enabling Biobank-Scale Genomic Processing with Spark SQLEnabling Biobank-Scale Genomic Processing with Spark SQL
Enabling Biobank-Scale Genomic Processing with Spark SQL
 
From SMW to Rules
From SMW to RulesFrom SMW to Rules
From SMW to Rules
 
Velox at SF Data Mining Meetup
Velox at SF Data Mining MeetupVelox at SF Data Mining Meetup
Velox at SF Data Mining Meetup
 
Maria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data streamMaria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data stream
 
Querying data on the Web – client or server?
Querying data on the Web – client or server?Querying data on the Web – client or server?
Querying data on the Web – client or server?
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of Semantics
 
Cypher and apache spark multiple graphs and more in open cypher
Cypher and apache spark  multiple graphs and more in  open cypherCypher and apache spark  multiple graphs and more in  open cypher
Cypher and apache spark multiple graphs and more in open cypher
 
WISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked DataWISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked Data
 
Information-Rich Programming in F# with Semantic Data
Information-Rich Programming in F# with Semantic DataInformation-Rich Programming in F# with Semantic Data
Information-Rich Programming in F# with Semantic Data
 
Sustainable queryable access to Linked Data
Sustainable queryable access to Linked DataSustainable queryable access to Linked Data
Sustainable queryable access to Linked Data
 
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...
 
Rethinking Online SPARQL Querying to Support Incremental Result Visualization
Rethinking Online SPARQL Querying to Support Incremental Result VisualizationRethinking Online SPARQL Querying to Support Incremental Result Visualization
Rethinking Online SPARQL Querying to Support Incremental Result Visualization
 
Storing and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the CloudStoring and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the Cloud
 
OpenML DALI
OpenML DALIOpenML DALI
OpenML DALI
 
Connecting Stream Reasoners on the Web
Connecting Stream Reasoners on the WebConnecting Stream Reasoners on the Web
Connecting Stream Reasoners on the Web
 
Steffen Staab's Presentation at SSSW 2011
Steffen Staab's Presentation at SSSW 2011Steffen Staab's Presentation at SSSW 2011
Steffen Staab's Presentation at SSSW 2011
 
Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

semlavssws2015

  • 1. Parallel  Data  Loading  during   Querying  Deep  Web  and  Linked   Open  Data  with  SPARQL Pauline  Folz 12,  Gabriela  Montoya  13,  Hala Skaf-­Molli 1,  Pascal  Molli 1   and  Maria-­Esther  Vidal  4 1 LINA-­-­ Nantes  University,  France 2 Nantes  Métropole -­ Direction  Recherche,  Innovation  et  Enseignement Supérieur,   3 Centre  National  de  la  Recherche Scientifique (CNRS),  France 4 Universidad  Simon  Bolivar,  Venezuela   1 SSWS2015@ISWC2015
  • 2. Querying  Linked  Open  Data  with   SPARQL   • Who  in  the  Semantic  Web  Community   knows  a  well  known  person? SELECT  DISTINCT  * WHERE  { ?P  foaf:member ?C  . ?C  rdfs:label ``Semantic  Web’’  . ?P  foaf:knows ?WKP  . ?WKP  foaf:name ?N. FILTER(?N=``Barack  Obama”) } 2 No   Results   L LOD  Data  sources
  • 3. • Who  in  the  Semantic  Web  Community   knows  a  well  known  person? SELECT  DISTINCT  * WHERE  { ?P  foaf:member ?C  . ?C  rdfs:label ``Semantic  Web’’  . ?P  foaf:knows ?WKP  . ?WKP  foaf:name ?N  . FILTER(?N=``Barack  Obama”) } LOD  Data  sources Querying  Deep  Web  and  Linked   Open  Data  with  SPARQL   3 Deep  Web    Data  sources Results   J !
  • 4. 4 P.  Folz,  G.  Montoya,  H.  Skaf-­Molli,  P.  Molli,  and  M.  Vidal.  Semlav:  Querying  deep  web  and  linked   open  data  with  SPARQL.  Demo  ESWC  2014,  Revised  Selected  Papers,  pages  332–337,  2014.   Video  available  at:  https://www.youtube.com/watch?v=z7w31f-­ybuQ
  • 5. SemLAV:  Local-­As-­View  Mediation   for  SPARQL 5 G.  Montoya,  L.  D.  Ibánez,  H.  Skaf-­Molli,  P.  Molli,  and  M.-­E.  Vidal.  SemLAV:  Local-­As-­View   Mediation  for  SPARQL.  Transactions  on  Large-­Scale  Data-­ and  Knowledge-­Centered   Systems,     LNCS,  Vol.  8420,  pages  33–58,  2014.   Q(P,C,WKP,N):-­ member(P,C),  label(C,”Semantic Web”),   knows(P,WKP),  name(WKP,”Barack Obama”) v1(P,A,I,C,L)  :-­ made(P,A),affiliation(P,I),member(P,C),label(C,L) v2(A,T,P,N,C)  :-­ title(A,T),made(P,A),name(P,N),member(P,C) v3(P,N,R,M)  :-­ name(P,N),name(R,M),knows(P,R) v4(P,N,G,R,C)  :-­name(P,N),gender(P,G),knows(P,R),member(P,C) v5(P,N,R,C,L)  :-­name(P,N),knows(P,R),member(P,C),label(C,L) Query : LAV  mappings:
  • 6. Compute  Buckets 6 G.  Montoya,  L.  D.  Ibánez,  H.  Skaf-­Molli,  P.  Molli,  and  M.-­E.  Vidal.  SemLAV:  Local-­As-­View  Mediation  for   SPARQL.  Transactions  on  Large-­Scale  Data-­ and  Knowledge-­Centered  Systems,    LNCS,  Vol.  8420,  pages   33–58,  2014.   Q(P,C,WKP,N):-­ member(P,C),  label(C,”Semantic Web”),  knows(P,WKP),   name(WKP,”Barack Obama”) v1(P,A,I,C,L):-­made(P,A),affiliation(P,I),member(P,C),label(C,L) v2(A,T,P,N,C):-­title(A,T),made(P,A),name(P,N),member(P,C) v3(P,N,R,M):-­name(P,N),name(R,M),knows(P,R) v4(P,N,G,R,C):-­name(P,N),gender(P,G),knows(P,R),member(P,C) v5(P,N,R,C,L):-­name(P,N),knows(P,R),member(P,C),label(C,L) Query : LAV  mappings: member(P,C) label(C,L) knows(P,WKP) name(WKP,N) v1(P,A,I,C,L) v1(P,A,I,C,L) v3(P,N,R,M) v2(A,T,P,N,C) v2(A,T,P,N,C) v5(P,N,R,C,L) v4(P,N,G,R,C) v3(P,N,R,M) v4(P,N,G,R,C) v5(P,N,R,C,L) v4(P,N,G,R,C) v5(P,N,R,C,L) v5(P,N,R,C,L)
  • 7. Bottleneck  of  LAV  approach • A  LAV  mediator  relies  on  a  query  rewriter  to  translate   a  mediator  query  into  the  union  of  queries  against  the   views. • The  number  of  candidate   rewritings  in  the  worst  case   is:  (M×|V|)N.  N  the  number  of  query  sub-­goals,  M  the   maximal  number  of  views  sub-­goals,  and  V  the  set  of   views,   – For  the  simple  query  example  -­>  96  candidate  rewritings – For  a  more  complex  query  -­>  millions  of  rewritings • Problems:   – Cannot  execute  all  rewritings   – Cannot  guess  which  rewritings  could  produce  results 7
  • 8. SemLAV Approach • Do  not  generate  rewritings • Materialize  relevant  views  and  execute   original  query – Problem:  maybe  no  time,  or  no  space  to   materialize  all  views • Materialization  order  matters: – Need  to  decide  which  views  to  materialize  views – We  decide  according  to  the  number  of  “covered   rewritings” 8
  • 9. Ranking  Relevant  Views 9 G.  Montoya,  L.  D.  Ibánez,  H.  Skaf-­Molli,  P.  Molli,  and  M.-­E.  Vidal.  SemLAV:  Local-­As-­View  Mediation  for   SPARQL.  Transactions  on  Large-­Scale  Data-­ and  Knowledge-­Centered  Systems,    LNCS,  Vol.  8420,  pages   33–58,  2014.   Q(P,C,WKP,N):-­ member(P,C),  label(C,”Semantic Web”),  knows(P,WKP),   name(WKP,,”Barack  Obama”) v1(P,A,I,C,L):-­made(P,A),affiliation(P,I),member(P,C),label(C,L) v2(A,T,P,N,C):-­title(A,T),made(P,A),name(P,N),member(P,C) v3(P,N,R,M):-­name(P,N),name(R,M),knows(P,R) v4(P,N,G,R,C):-­name(P,N),gender(P,G),knows(P,R),member(P,C) v5(P,N,R,C,L):-­name(P,N),knows(P,R),member(P,C),label(C,L) Query : LAV  mappings: member(P,C) label(C,L) knows(P,WKP) name(WKP,N) v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L) v4(P,N,G,R,C) v1(P,A,I,C,L) v4(P,N,G,R,C) v4(P,N,G,R,C) v1(P,A,I,C,L) v3(P,N,R,M) v2(A,T,P,N,C) v2(A,T,P,N,C) v3(P,N,R,M) 4 3 2 2
  • 10. Materialization  Order  Matters 10 #  Included views (k) SemLAV ranking Random  order Included views (Vk) #  Covered   rewritings Included views (Vk) #  Covered   rewritings 1 2 3 4 5 v5 v5,  v4 v5,  v4,  v1 v5,  v4,  v1,  v3 v5,  v4,  v1,  v3,  v2 1×1×1×1=1 2×1×2×2=8 3  × 2  × 2  × 2  =  24 3  × 2  × 3  × 3  =  54 4  × 2  × 3  × 4  =  96 v1 v1,  v2 v1,  v2,  v3 v1,  v2,  v3,  v4 v1,  v2,  v3,  v4,  v5 1×1×0×0=0 2×1×0×1=0 2×1×1×2=4 3  × 1  × 2  × 3  =  18 4  × 2  × 3  × 4  =  96 member(P,C) label(C,L) knows(P,WKP) name(WKP,N) v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L) v4(P,N,G,R,C) v1(P,A,I,C,L) v4(P,N,G,R,C) v4(P,N,G,R,C) v1(P,A,I,C,L) v3(P,N,R,M) v2(A,T,P,N,C) v2(A,T,P,N,C) v3(P,N,R,M) 4 3 2 2
  • 11. Query  processing  over  materialized  views 11 v4 v1 v5 member(P,C) label(C,L) knows(P,WKP) name(WKP,N) v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L) v4(P,N,G,R,C) v1(P,A,I,C,L) v4(P,N,G,R,C) v4(P,N,G,R,C) v1(P,A,I,C,L) v3(P,N,R,M) v2(A,T,P,N,C) v2(A,T,P,N,C) v3(P,N,R,M) v2 v3
  • 12. So  SemLAV Works  J 12 Number  of  Answers  produced  by  SemLAV and  randomly  selected  views  during  two   minutes.
  • 13. Drawbacks  of  SemLAV • Blocking  execution   strategy:   – Views  are  contacted  one  by   one  in  order. – If  v5  is  huge.. • Impact  performance  of   SemLAV: – Throughput – Time  of  first  answer – Total  Time 13 v1 v5 v4 v2 v3
  • 14. View  Loading  and  Query  Execution Sequential  loading Parallel  loading 14 v5 v4 v1 v2 v3 v5 v1 v2 v4 v3 A  pool  of  3  threads  to  download  in   parallel. When  v1  is  loaded  and  the  query  is   executed -­ Expect  more  answers,  sooner  ?? -­ But,  the  number  of  triples  is  growing   much  faster  than  in  sequential  
  • 15. View  Loading  and  Query  Execution Sequential  loading Parallel  loading 15 V5 V4 v1 v2 v3 V5 v1 V2 v4 v3 Loading  data  in  parallel  requires  to  : • Manage  concurrent  insertions  into   the  integrated  RDF  graph  
  • 16. Concurrency  Management • Parallel  insertions  into  a  grow  only   graph  is  a  lock-­free  problem.   • However,  existing  RDF  stores  are   designed  for   insert/delete/transaction. • Hence,  RDF  stores  poorly  support   parallel  materialization  of  views   (need  for  a  dedicated  RDF  store). 16
  • 17. parallel  SemLAV (PS):   Concurrency  Model – We  simulated  on  the  top  of  JENA  a  Single-­ Reader/Multiple-­Writers  strategy  (SRMW). – Each  view  is  divided  into  n blocks  of  100   triples. 17 v5 v1 v2 v4 v3 A  bock  of  100  triples • Could  we  have  better   performances  just  with  that  ?
  • 18. When  to  execute  the  query? • Why  waiting    until  a  view  is  loaded  to  execute  the   query  ?  Others  simple  strategies  are  possible?   Which  one  is  the  best? • Be  careful  : – more  query  execution  -­>  less  loading – less  query  execution  -­>  more  time  for  first  results • We  define  four  execution  strategies. – View  dependent  (PS),  Time  dependent  (PS-­TDC),   Data  dependent  (PS-­DDC),  Two-­phase  execution   (DDC-­ASK),  (TDC-­ASK)   18
  • 19. View  Dependent  Criterion  (PS) • The  query  engine  is  woken  up   after  a  new  view  is  completely   loaded. 19 v5 v1 v2 v4 v3
  • 20. Time  Dependent  Criterion  (PS-­TDC) • The  query  engine  is  woken  up  after  a   period  of  time  t – if  t is  n milliseconds,  execute  query  every  n milliseconds   20 v5 V1 v2 V4 v3 0 n 4n 2n 3n time
  • 21. Data  Dependent  Criterion  (PS-­DDC) • The  query  engine  is  woken  up  after  a   certain  number  n  of  triples  are  inserted   into  the  integrated  RDF  graph  by  the   writers. 21 v5 V1 v2 V4 V3 0 n 4n 2n 3n Data   size
  • 22. Two-­phases  Criterion  (PS-­DDC-­ ASK)  and  (PS-­TDC-­ASK) • First  phase  performs  an  ASK    query  to   check  for  new  results:  if  yes,  2nd phase.   • Second  phase  executes  the  original  query – (PS-­TDC-­ASK)  or  (PS-­DDC-­ASK)  . 22 v5 v1 v2v4 v3 ASK  -­‐>NO ASK  -­‐>NO ASK  -­‐>  Yes
  • 23. Experimentations  Evaluation • Implement  and  compare  with  SemLAV: – Berlin  Benchmark1:  10,000,736  triples – 16  queries  (out  of  18),  510  views – Linux  server  with  128  GB  of  memory,  124   processors,  20  GB  of  RAM  are  allocated  for  the   experiments. • For  parallel  SemLAV (PS) – Threads  are  executed  in  parallel  to  download  views – Different  number  of  threads:  5,  10  and  20  threads – More  information  in  the  paper  and  project  website:   https://sites.goole.com/site/sematiclav 23
  • 24. Results  of  BSBM  View  Dependent  Criterion  (PS) 24
  • 25. Results  of  BSBM  using  Time-­Dependent  Criterion   (PS-­TDC).  Queries  are  executed  every  500  msecs 25
  • 26. Results  of  BSBM   using  Data  Dependent  Criterion   (PS-­DDC).  Queries  are  executed  every  the   insertion  of  500  triples 26
  • 27. 27 Results  of  BSBM    using   PS-­DDC-­ASK  strategy,  queries   are  executed  whenever  500  triples  inserted  in  the   integrated  RDF  graph
  • 29. 29 Better  Total  Time for  parallel  SemLAV But  no  dominate  strategy  
  • 30. Better  throughput for  parallel  SemLAV But  no  dominate  strategy
  • 32. Conclusion  and  Future  Work • Parallel  processing  of  SPARQL  queries  using  LAV   Views. • New  execution  strategies    outperforms  SemLAV in   terms  of  throughput  and  total  Time.   • Trade-­off  between  throughput  and  time  for  first   answer. • In  the  future: – Build  a  grow  only  RDF  store  to  better  support  parallel   loading – Incremental  evaluation  of  the  query  relying  on  view   update… 32
  • 33. 33