SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
More Complete Resultset Retrieval from Large
Heterogeneous RDF Sources
Andre Valdestilhas Tommaso Soru Muhammad Saleem
AKSW Group, University of Leipzig, Germany
November 24, 2019
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 1 / 17
Outline
Motivation
Approach
Experiments
Conclusion & Future work
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 2 / 17
Motivation
The LOD Cloud
2007
2009
2019
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 3 / 17
Motivation
Where to find RDF datasets?
9,960
raw RDF datasets658,206
Datasets (HDT files)
LODLaundromat
Which Dataset?
...
559
Endpoint
Different formats1
Query more than 221 billion triples (> 5 Terabytes)
1Serialization.
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 4 / 17
Example
Where to find RDF datasets?
Authors that have a paper type poster/demo in the proceedings of ISWC
20082
2Query from FEDBench
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 5 / 17
Example
Where to find RDF datasets?
Authors that have a paper type poster/demo in the proceedings of ISWC
20083
4 HDT datasets4
containing data that can answer the query
3Query from FEDBench
4Semantic Web Dog Food from LOD Laundromat datasets.
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 6 / 17
Motivation
Approaches
(+) Multiple SPARQL endpoints
(-) 90% are dump files
(+) Dereferenceable URIs
(-) 43% of the URIs are
non-dereferenceable
Endpoint HDT	file file.rdf Dump_file_2
WIMU
Where is my URI?
(+) Data from non-dereferenceable
URIs
(-) No SPARQL query
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 7 / 17
The approach
A hybrid SPARQL query engine
Collect data from multiple SPARQL endpoints,
Data from RDF dumps including HDT files and use Link Traversal
Link Traversal, obtaining data from non-dereferenceable URIs using WIMUa
aWhere is my URI?(WIMU) http://wimu.aksw.org/
Resulting in
More complete results
Experiments with 3 state-of-the-art SPARQL query benchmarks,
LargeRDFBench, FedBench and FEASIBLE
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 8 / 17
The approach
Select ?p ?o
Where {<http://uri.com> ?p ?o}
Endpoint
hdt file
dump.bz2
file.rdf
...
http://uri1.com
http://uri2.com
http://uriN.com
Extract URIs
WIMU
1
2
3
Data Dumps
Query processor
Traversal Based
Query processor
Union of
the results
Source Filtering
SPARQL-a-lot
Query processor
SPARQL Endpoint 
Query processor
wimuQ query
execution engine
Results
<subject1><predicate1><object1>
<subject2><predicate2><object2>
<subjectN><predicateN><objectN>
4
5
6
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 9 / 17
The approach
The source selection
Identify relevant datasets from WIMU
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 10 / 17
Evaluation
Hypothesis Identify automatically relevant sources from heterogeneous RDF
data, even with non-dereferenceable URIs, can improve the
resultset retrieval
Metrics Coverage and runtime
Approaches FedX (endpoints), SQUIN (Traversal-based), SPARQL-a-lot and
WIMU(dumps)
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 11 / 17
Evaluation
Experimental setup
Datasets 221.7 billion triples (>5 terabytes)
Queries 415 queries from FedBench, LargeRDFBench and FEASIBLE
Each query executed 5 times
Hardware 200 GB HD, 8GB RAM, 2.70GHZ single core processor
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 12 / 17
Evaluation
Coverage: Overall 76% queries with results(Zero results=non-public endpoints/data -
non-indexed)
CD LS LD Simple Comp Large Chs Dbpedia SWDF
FedBench | |LargeRDFBench Feasible
100
1000
10000
100000
Averagenumberofresults
onlogscale
EndPoints(FedX) SPARQL-a-lot LinkTraversal(SQUIN) wimuDumps
wimuQ
Approaches and the best coverage
FedBench 55% endpoints
LargeRDFBench 81% wimuDumps
FEASIBLE 98% wimuDumps
Observation
The combination of those query
processing engines implies more resultset
retrieval
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 13 / 17
Evaluation
Number of datasets
More datasets discovered does not implies in more results
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 14 / 17
Evaluation
Runtime
Total Average 17 minutes across 3 benchmarks (wimuDumps 2 min, Endpoints
13 min, SPARQL-a-lot 58 sec, LinkTraversal-SQUIN 36 sec)
CD LS LD Simple Comp Large Chs Dbpedia SWDF
FedBench LargeRDFBench Feasible
| |
1
10
Averagerun-time(minutes)
onlogscale
EndPoints(FedX) SPARQL-a-lot LinkTraversal(SQUIN) wimuDumps
wimuQ
Interesting wimuQ takes 91% of results from wimuDumps, only 7% from
SPARQL endpoints. Possible reason, SPARQL endpoint
federation split among multiple endpoints, network and number
of intermediate results influence in the runtime
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 15 / 17
Conclusion & Future works
Conclusion
A hybrid SPARQL query processing engine to execute SPARQL queries over a
large amount of heterogeneous RDF data
Evaluation on real world datasets using the state of the art of federated and
non-federated query benchmarks (FedBench, LargeRDFBench and
FEASIBLE)
We present the first federated SPARQL query processing engine that executes
SPARQL queries over a total of 221.7 billion triples
Future work
Add more URIs into WIMU index and use Triple Pattern Fragments
A Large Scale approach to study the relation and similarity among the
datasets
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 16 / 17
That’s all Folks!
Thanks!
Questions?Github repository: https://github.com/firmao/wimuT
Prototype: https://w3id.org/wimuq/
Contact: valdestilhas@informatik.uni-leipzig.de
Special thanks to my PhD. advisor Prof. Dr. rer. nat. Thomas Riechert
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 17 / 17

Contenu connexe

Tendances

Ase2010 shang
Ase2010 shangAse2010 shang
Ase2010 shangSAIL_QU
 
Mining and Untangling Change Genealogies (PhD Defense Talk)
Mining and Untangling Change Genealogies (PhD Defense Talk)Mining and Untangling Change Genealogies (PhD Defense Talk)
Mining and Untangling Change Genealogies (PhD Defense Talk)Kim Herzig
 
RDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementationsRDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementationsJean-Paul Calbimonte
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaAndy Petrella
 
Big data for SAS programmers
Big data for SAS programmersBig data for SAS programmers
Big data for SAS programmersKevin Lee
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystemSlideCentral
 
Opportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesOpportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesIan Foster
 
Functional Composition of Sensor Web APIs
Functional Composition of Sensor Web APIsFunctional Composition of Sensor Web APIs
Functional Composition of Sensor Web APIsRuben Verborgh
 
The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...University of California, San Diego
 
Connecting Stream Reasoners on the Web
Connecting Stream Reasoners on the WebConnecting Stream Reasoners on the Web
Connecting Stream Reasoners on the WebJean-Paul Calbimonte
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light SourcesIan Foster
 
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudAccelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudGlobus
 
Triplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the WebTriplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the WebDaniele Dell'Aglio
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting LiStanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting LiPacificResearchPlatform
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceGlobus
 

Tendances (20)

Friday talk 11.02.2011
Friday talk 11.02.2011Friday talk 11.02.2011
Friday talk 11.02.2011
 
Ase2010 shang
Ase2010 shangAse2010 shang
Ase2010 shang
 
Mining and Untangling Change Genealogies (PhD Defense Talk)
Mining and Untangling Change Genealogies (PhD Defense Talk)Mining and Untangling Change Genealogies (PhD Defense Talk)
Mining and Untangling Change Genealogies (PhD Defense Talk)
 
RDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementationsRDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementations
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and Scala
 
Big data for SAS programmers
Big data for SAS programmersBig data for SAS programmers
Big data for SAS programmers
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Opportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesOpportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architectures
 
Functional Composition of Sensor Web APIs
Functional Composition of Sensor Web APIsFunctional Composition of Sensor Web APIs
Functional Composition of Sensor Web APIs
 
The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...
 
Connecting Stream Reasoners on the Web
Connecting Stream Reasoners on the WebConnecting Stream Reasoners on the Web
Connecting Stream Reasoners on the Web
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...
Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...
Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...
 
Dynamic integrations of crop data and corresponding meteorological data based...
Dynamic integrations of crop data and corresponding meteorological data based...Dynamic integrations of crop data and corresponding meteorological data based...
Dynamic integrations of crop data and corresponding meteorological data based...
 
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudAccelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
 
Triplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the WebTriplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the Web
 
HDF Town Hall
HDF Town HallHDF Town Hall
HDF Town Hall
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting LiStanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials Science
 

Similaire à More Complete Resultset Retrieval from Large Heterogeneous RDF Sources

2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Juan Sequeda
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013François Belleau
 
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...Muhammad Saleem
 
Sustainable queryable access to Linked Data
Sustainable queryable access to Linked DataSustainable queryable access to Linked Data
Sustainable queryable access to Linked DataRuben Verborgh
 
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...CIARD Movement
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012François Belleau
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Webebiquity
 
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Dr.-Ing. Thomas Hartmann
 
RDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactRDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactJean-Paul Calbimonte
 
2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata2010 03 Lodoxf Openflydata
2010 03 Lodoxf OpenflydataJun Zhao
 
2008 11 13 Hcls Call
2008 11 13 Hcls Call2008 11 13 Hcls Call
2008 11 13 Hcls CallJun Zhao
 
rdf query reformulation
rdf query reformulationrdf query reformulation
rdf query reformulationINRIA-OAK
 
Using Architectures for Semantic Interoperability to Create Journal Clubs for...
Using Architectures for Semantic Interoperability to Create Journal Clubs for...Using Architectures for Semantic Interoperability to Create Journal Clubs for...
Using Architectures for Semantic Interoperability to Create Journal Clubs for...James Powell
 
Re-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutRe-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutMediaMixerCommunity
 

Similaire à More Complete Resultset Retrieval from Large Heterogeneous RDF Sources (20)

2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013
 
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
 
Eswc2018 wimu slides
Eswc2018 wimu slidesEswc2018 wimu slides
Eswc2018 wimu slides
 
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
 
Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009
 
Sustainable queryable access to Linked Data
Sustainable queryable access to Linked DataSustainable queryable access to Linked Data
Sustainable queryable access to Linked Data
 
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
 
On a web of data streams
On a web of data streamsOn a web of data streams
On a web of data streams
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
 
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
 
RDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactRDF Stream Processing: Let's React
RDF Stream Processing: Let's React
 
2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata
 
2008 11 13 Hcls Call
2008 11 13 Hcls Call2008 11 13 Hcls Call
2008 11 13 Hcls Call
 
rdf query reformulation
rdf query reformulationrdf query reformulation
rdf query reformulation
 
Using Architectures for Semantic Interoperability to Create Journal Clubs for...
Using Architectures for Semantic Interoperability to Create Journal Clubs for...Using Architectures for Semantic Interoperability to Create Journal Clubs for...
Using Architectures for Semantic Interoperability to Create Journal Clubs for...
 
Re-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutRe-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playout
 
Bio2RDF@BH2010
Bio2RDF@BH2010Bio2RDF@BH2010
Bio2RDF@BH2010
 

Plus de André Valdestilhas

NaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdfNaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdfAndré Valdestilhas
 
Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017André Valdestilhas
 
Iswc 2016 completeness correctude
Iswc 2016 completeness correctudeIswc 2016 completeness correctude
Iswc 2016 completeness correctudeAndré Valdestilhas
 
Emotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resourcesEmotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resourcesAndré Valdestilhas
 
Emotion-oriented computing: Possible uses and applications
Emotion-oriented computing: Possible uses and  applicationsEmotion-oriented computing: Possible uses and  applications
Emotion-oriented computing: Possible uses and applicationsAndré Valdestilhas
 
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...André Valdestilhas
 

Plus de André Valdestilhas (11)

NaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdfNaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdf
 
Presentation MSE 2022
Presentation MSE 2022Presentation MSE 2022
Presentation MSE 2022
 
Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017
 
WASOTA (poster)
WASOTA (poster)WASOTA (poster)
WASOTA (poster)
 
Iswc 2016 completeness correctude
Iswc 2016 completeness correctudeIswc 2016 completeness correctude
Iswc 2016 completeness correctude
 
DBpediaSameAs
DBpediaSameAsDBpediaSameAs
DBpediaSameAs
 
DBpediaSameAs
DBpediaSameAsDBpediaSameAs
DBpediaSameAs
 
Emotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resourcesEmotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resources
 
Using semiotic profile
Using semiotic profileUsing semiotic profile
Using semiotic profile
 
Emotion-oriented computing: Possible uses and applications
Emotion-oriented computing: Possible uses and  applicationsEmotion-oriented computing: Possible uses and  applications
Emotion-oriented computing: Possible uses and applications
 
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
 

Dernier

lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lodhisaajjda
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Kayode Fayemi
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Delhi Call girls
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxNikitaBankoti2
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIINhPhngng3
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardsticksaastr
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsaqsarehman5055
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxmohammadalnahdi22
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoKayode Fayemi
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxraffaeleoman
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfSenaatti-kiinteistöt
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesPooja Nehwal
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfSkillCertProExams
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Vipesco
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatmentnswingard
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaKayode Fayemi
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyPooja Nehwal
 

Dernier (20)

lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animals
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 

More Complete Resultset Retrieval from Large Heterogeneous RDF Sources

  • 1. More Complete Resultset Retrieval from Large Heterogeneous RDF Sources Andre Valdestilhas Tommaso Soru Muhammad Saleem AKSW Group, University of Leipzig, Germany November 24, 2019 Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 1 / 17
  • 2. Outline Motivation Approach Experiments Conclusion & Future work Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 2 / 17
  • 3. Motivation The LOD Cloud 2007 2009 2019 Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 3 / 17
  • 4. Motivation Where to find RDF datasets? 9,960 raw RDF datasets658,206 Datasets (HDT files) LODLaundromat Which Dataset? ... 559 Endpoint Different formats1 Query more than 221 billion triples (> 5 Terabytes) 1Serialization. Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 4 / 17
  • 5. Example Where to find RDF datasets? Authors that have a paper type poster/demo in the proceedings of ISWC 20082 2Query from FEDBench Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 5 / 17
  • 6. Example Where to find RDF datasets? Authors that have a paper type poster/demo in the proceedings of ISWC 20083 4 HDT datasets4 containing data that can answer the query 3Query from FEDBench 4Semantic Web Dog Food from LOD Laundromat datasets. Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 6 / 17
  • 7. Motivation Approaches (+) Multiple SPARQL endpoints (-) 90% are dump files (+) Dereferenceable URIs (-) 43% of the URIs are non-dereferenceable Endpoint HDT file file.rdf Dump_file_2 WIMU Where is my URI? (+) Data from non-dereferenceable URIs (-) No SPARQL query Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 7 / 17
  • 8. The approach A hybrid SPARQL query engine Collect data from multiple SPARQL endpoints, Data from RDF dumps including HDT files and use Link Traversal Link Traversal, obtaining data from non-dereferenceable URIs using WIMUa aWhere is my URI?(WIMU) http://wimu.aksw.org/ Resulting in More complete results Experiments with 3 state-of-the-art SPARQL query benchmarks, LargeRDFBench, FedBench and FEASIBLE Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 8 / 17
  • 9. The approach Select ?p ?o Where {<http://uri.com> ?p ?o} Endpoint hdt file dump.bz2 file.rdf ... http://uri1.com http://uri2.com http://uriN.com Extract URIs WIMU 1 2 3 Data Dumps Query processor Traversal Based Query processor Union of the results Source Filtering SPARQL-a-lot Query processor SPARQL Endpoint  Query processor wimuQ query execution engine Results <subject1><predicate1><object1> <subject2><predicate2><object2> <subjectN><predicateN><objectN> 4 5 6 Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 9 / 17
  • 10. The approach The source selection Identify relevant datasets from WIMU Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 10 / 17
  • 11. Evaluation Hypothesis Identify automatically relevant sources from heterogeneous RDF data, even with non-dereferenceable URIs, can improve the resultset retrieval Metrics Coverage and runtime Approaches FedX (endpoints), SQUIN (Traversal-based), SPARQL-a-lot and WIMU(dumps) Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 11 / 17
  • 12. Evaluation Experimental setup Datasets 221.7 billion triples (>5 terabytes) Queries 415 queries from FedBench, LargeRDFBench and FEASIBLE Each query executed 5 times Hardware 200 GB HD, 8GB RAM, 2.70GHZ single core processor Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 12 / 17
  • 13. Evaluation Coverage: Overall 76% queries with results(Zero results=non-public endpoints/data - non-indexed) CD LS LD Simple Comp Large Chs Dbpedia SWDF FedBench | |LargeRDFBench Feasible 100 1000 10000 100000 Averagenumberofresults onlogscale EndPoints(FedX) SPARQL-a-lot LinkTraversal(SQUIN) wimuDumps wimuQ Approaches and the best coverage FedBench 55% endpoints LargeRDFBench 81% wimuDumps FEASIBLE 98% wimuDumps Observation The combination of those query processing engines implies more resultset retrieval Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 13 / 17
  • 14. Evaluation Number of datasets More datasets discovered does not implies in more results Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 14 / 17
  • 15. Evaluation Runtime Total Average 17 minutes across 3 benchmarks (wimuDumps 2 min, Endpoints 13 min, SPARQL-a-lot 58 sec, LinkTraversal-SQUIN 36 sec) CD LS LD Simple Comp Large Chs Dbpedia SWDF FedBench LargeRDFBench Feasible | | 1 10 Averagerun-time(minutes) onlogscale EndPoints(FedX) SPARQL-a-lot LinkTraversal(SQUIN) wimuDumps wimuQ Interesting wimuQ takes 91% of results from wimuDumps, only 7% from SPARQL endpoints. Possible reason, SPARQL endpoint federation split among multiple endpoints, network and number of intermediate results influence in the runtime Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 15 / 17
  • 16. Conclusion & Future works Conclusion A hybrid SPARQL query processing engine to execute SPARQL queries over a large amount of heterogeneous RDF data Evaluation on real world datasets using the state of the art of federated and non-federated query benchmarks (FedBench, LargeRDFBench and FEASIBLE) We present the first federated SPARQL query processing engine that executes SPARQL queries over a total of 221.7 billion triples Future work Add more URIs into WIMU index and use Triple Pattern Fragments A Large Scale approach to study the relation and similarity among the datasets Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 16 / 17
  • 17. That’s all Folks! Thanks! Questions?Github repository: https://github.com/firmao/wimuT Prototype: https://w3id.org/wimuq/ Contact: valdestilhas@informatik.uni-leipzig.de Special thanks to my PhD. advisor Prof. Dr. rer. nat. Thomas Riechert Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 17 / 17