SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
23rd International World Wide Web Conference, 10th April 2014, Seoul, Korea
TripleProv
Efficient Processing of Lineage
Queries over a Native RDF Store
Marcin Wylot1
,
Philippe Cudré-Mauroux1
, and Paul Groth2
1)
eXascale Infolab, University of Fribourg, Switzerland
2)
Web & Madia Group, VU University Amsterdam, Netherlands
Outline
➢ Motivation
➢ Provenance Polynomials
➢ System
➢ Results
Data Provenance
“Provenance is information about
entities, activities, and people involved
in producing a piece of data or thing, which can be used to form
assessments about its quality, reliability or trustworthiness.”
How a query answer was derived: what data was
combined to produce the result.
Data Integration
➢ Integrated and summarized data
➢ Trust, transparency, and cost
➢ Capability to pinpoint the exact
source from which the result was
selected
➢ Capability to trace back the
complete list of sources and how
they were combined to deliver a
result
Querying Distributed Data Sources
How exactly was the answer derived?
Application: Post-query Calculations
➢ Scores or probabilities for query result
➢ Result ranking
➢ Compute trust
➢ Information quality based on used sources
Application: Query Execution
➢ Modify query strategies on the fly
➢ Restrict results to certain subset of sources
➢ Restrict results w.r.t. queries over provenance
➢ Access control, only certain sources will appear
➢ Detect if result would be valid when removing certain
source
Provenance Polynomials
➢ Ability to characterize ways each source contributed
➢ Pinpoint the exact source to each result
➢ Trace back the list of sources the way they were combined
to deliver a result
Graph-based Query
select ?lat ?long ?g1 ?g2 ?g3 ?g4
where {
graph ?g1 {?a [] "Eiffel Tower" . }
graph ?g2 {?a inCountry FR . }
graph ?g3 {?a lat ?lat . }
graph ?g4 {?a long ?long . }
}
lat long l1 l2 l4 l4,
lat long l1 l2 l4 l5,
lat long l1 l2 l5 l4,
lat long l1 l2 l5 l5,
lat long l1 l3 l4 l4,
lat long l1 l3 l4 l5,
lat long l1 l3 l5 l4,
lat long l1 l3 l5 l5,
lat long l2 l2 l4 l4,
lat long l2 l2 l4 l5,
lat long l2 l2 l5 l4,
lat long l2 l2 l5 l5,
lat long l2 l3 l4 l4,
lat long l2 l3 l4 l5,
lat long l2 l3 l5 l4,
lat long l2 l3 l5 l5,
lat long l3 l2 l4 l4,
lat long l3 l2 l4 l5,
lat long l3 l2 l5 l4,
lat long l3 l2 l5 l5,
lat long l3 l3 l4 l4,
lat long l3 l3 l4 l5,
lat long l3 l3 l5 l4,
lat long l3 l3 l5 l5,
TripleProv Resuls
result:
lat, long
provenance polynomial:
(l1 ⊕ l2 ⊕ l3) ⊗ (l4 ⊕ l5) ⊗ ( l6 ⊕ l7) ⊗ (l8 ⊕ l9)
Polynomials Operators
➢ Union (⊕)
○ constraint or projection satisfied with multiple sources
l1 ⊕ l2 ⊕ l3
○ multiple entities satisfy a set of constraints or projections
➢ Join (⊗)
○ sources joined to handle a constraint or a projection
○ OS and OO joins between few sets of constraints
(l1 ⊕ l2) ⊗ (l3 ⊕ l4)
Example Polynomial
select ?lat ?long where {
?a [] ``Eiffel Tower''.
?a inCountry FR .
?a lat ?lat .
?a long ?long .
}
(l1 ⊕ l2 ⊕ l3) ⊗ (l4 ⊕ l5) ⊗ ( l6 ⊕ l7) ⊗ (l8 ⊕ l9)
Example Polynomial
select ?l ?long ?lat where
{
?p name ``Krebs, Emil'' .
?p deathPlace ?l .
?c [] ?l .
?c featureClass P .
?c inCountry DE .
?c long ?long .
?c lat ?lat .
}
[(l1 ⊕ l2 ⊕ l3) ⊗ (l4 ⊕ l5)]
⊗
[( l6 ⊕ l7) ⊗ (l8) ⊗ (l9 ⊕ l10) ⊗ (l11 ⊕ l12) ⊗ (l13)]
Granularity Levels
➢ source-level: sources of a triples
➢ triple-level: all pieces of data used to answer the query
(l1 ⊕ l2) ⊗ (l3 ⊕ l4)
System Architecture
Native Data Model
➢ Semantically co-located data
➢ Template based molecules
Various Physical Storage Models
Differences:
➢ ease of implementation
➢ memory consumption
➢ query execution
➢ interference with the original concept of molecule
1) SPOL 2) LSPO 3) SLPO 4) SPLO
Annotated Triples
➢ Annotated provenance
➢ Quadruples
➢ Easy to implement
➢ Source data repeated
for each triple
Co-located Elements
➢ Data grouped by source
➢ Physically co-located
➢ Avoids duplication of the
same source inside a
molecule
➢ Data about a given subject
co-located in one molecule
➢ More difficult to implement
Experiments
How expensive it is to trace
provenance?
What is the overhead on query
execution time?
Datasets
➢ Two collections of RDF data gathered from the Web
○ Billion Triple Challenge (BTC): Crawled from the linked
open data cloud
○ Web Data Commons (WDC): RDFa, Microdata
extracted from common crawl
➢ Typical collections gathered from multiple sources
➢ sampled subsets of ~110 million triples each; ~25GB each
Workloads
➢ 8 Queries defined for BTC
○ T. Neumann and G. Weikum. Scalable join processing on very large rdf
graphs. In Proceedings of the 2009 ACM SIGMOD International
Conference on Management of data, pages 627–640. ACM, 2009.
➢ Two additional queries with UNION and OPTIONAL
clauses
➢ 7 various new queries for WDC
http://exascale.info/tripleprov
Results
Overhead of tracking provenance compared to
vanilla version of the system for BTC dataset
source-level co-located
source-level annotated
triple-level co-located
triple-level annotated
Conclusions
➢ provenance overhead is considerable but acceptable,
on average about 60-70%
➢ most suitable storage model depends upon data and
workloads characteristics
➢ annotated: more appropriate for heterogenous datasets
and workloads retrieving provenance
➢ co-located: more appropriate for homogenous datasets
and workload filtering by source
Future Work
➢ Distributed version
➢ Dynamic storage model
➢ Adaptive query execution strategies
➢ PROV output
➢ Over provenance queries
Summary
➢ TripleProv: an efficient triplestore tracking provenance
➢ Two storage models
➢ Fine-grained multilevel provenance tracing
➢ Formal provenance polynomials
➢ Experimental evaluation
http://exascale.info/tripleprov
Loading & Memory
Billion Triple Challenge
Web Data Commons
Results
Overhead of tracking provenance compared to
vanilla version of the system for WDC dataset
source-level SLPO
source-level SPOL
triple-level SLPO
triple-level SPOL
Polynomials: multiple records
[(l1 ⊕ l2 ⊕ l3) ⊗ (l4 ⊕ l5) ⊗ ( l6 ⊕ l7) ⊗ (l8 ⊕ l9)]
⊕
[(l5 ⊕ l7) ⊗ (l4) ⊗ ( l13 ⊕ l17) ⊗ (l28)]
⊕
[(l4) ⊗ (l1 ⊕ l2) ⊗ ( l3 ⊕ l7) ⊗ (l8 ⊕ l9⊕ l4)]

Contenu connexe

Plus de eXascale Infolab

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictioneXascale Infolab
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...eXascale Infolab
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex GraphseXascale Infolab
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapeXascale Infolab
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...eXascale Infolab
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...eXascale Infolab
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceanseXascale Infolab
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutioneXascale Infolab
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataeXascale Infolab
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data ManagementeXascale Infolab
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataeXascale Infolab
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataeXascale Infolab
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingeXascale Infolab
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...eXascale Infolab
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingeXascale Infolab
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big DataeXascale Infolab
 

Plus de eXascale Infolab (20)

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
 
Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
 
Crowd scheduling www2016
Crowd scheduling www2016Crowd scheduling www2016
Crowd scheduling www2016
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task Crowdsourcing
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
 
OLTP-Bench
OLTP-BenchOLTP-Bench
OLTP-Bench
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
 

Dernier

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionPriyansha Singh
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptxkhadijarafiq2012
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 

Dernier (20)

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorption
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptx
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 

TripleProv: Efficient Processing of Lineage Queries over a Native RDF Store

  • 1. 23rd International World Wide Web Conference, 10th April 2014, Seoul, Korea TripleProv Efficient Processing of Lineage Queries over a Native RDF Store Marcin Wylot1 , Philippe Cudré-Mauroux1 , and Paul Groth2 1) eXascale Infolab, University of Fribourg, Switzerland 2) Web & Madia Group, VU University Amsterdam, Netherlands
  • 2. Outline ➢ Motivation ➢ Provenance Polynomials ➢ System ➢ Results
  • 3. Data Provenance “Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness.” How a query answer was derived: what data was combined to produce the result.
  • 4. Data Integration ➢ Integrated and summarized data ➢ Trust, transparency, and cost ➢ Capability to pinpoint the exact source from which the result was selected ➢ Capability to trace back the complete list of sources and how they were combined to deliver a result
  • 5. Querying Distributed Data Sources How exactly was the answer derived?
  • 6. Application: Post-query Calculations ➢ Scores or probabilities for query result ➢ Result ranking ➢ Compute trust ➢ Information quality based on used sources
  • 7. Application: Query Execution ➢ Modify query strategies on the fly ➢ Restrict results to certain subset of sources ➢ Restrict results w.r.t. queries over provenance ➢ Access control, only certain sources will appear ➢ Detect if result would be valid when removing certain source
  • 8. Provenance Polynomials ➢ Ability to characterize ways each source contributed ➢ Pinpoint the exact source to each result ➢ Trace back the list of sources the way they were combined to deliver a result
  • 9. Graph-based Query select ?lat ?long ?g1 ?g2 ?g3 ?g4 where { graph ?g1 {?a [] "Eiffel Tower" . } graph ?g2 {?a inCountry FR . } graph ?g3 {?a lat ?lat . } graph ?g4 {?a long ?long . } } lat long l1 l2 l4 l4, lat long l1 l2 l4 l5, lat long l1 l2 l5 l4, lat long l1 l2 l5 l5, lat long l1 l3 l4 l4, lat long l1 l3 l4 l5, lat long l1 l3 l5 l4, lat long l1 l3 l5 l5, lat long l2 l2 l4 l4, lat long l2 l2 l4 l5, lat long l2 l2 l5 l4, lat long l2 l2 l5 l5, lat long l2 l3 l4 l4, lat long l2 l3 l4 l5, lat long l2 l3 l5 l4, lat long l2 l3 l5 l5, lat long l3 l2 l4 l4, lat long l3 l2 l4 l5, lat long l3 l2 l5 l4, lat long l3 l2 l5 l5, lat long l3 l3 l4 l4, lat long l3 l3 l4 l5, lat long l3 l3 l5 l4, lat long l3 l3 l5 l5,
  • 10. TripleProv Resuls result: lat, long provenance polynomial: (l1 ⊕ l2 ⊕ l3) ⊗ (l4 ⊕ l5) ⊗ ( l6 ⊕ l7) ⊗ (l8 ⊕ l9)
  • 11. Polynomials Operators ➢ Union (⊕) ○ constraint or projection satisfied with multiple sources l1 ⊕ l2 ⊕ l3 ○ multiple entities satisfy a set of constraints or projections ➢ Join (⊗) ○ sources joined to handle a constraint or a projection ○ OS and OO joins between few sets of constraints (l1 ⊕ l2) ⊗ (l3 ⊕ l4)
  • 12. Example Polynomial select ?lat ?long where { ?a [] ``Eiffel Tower''. ?a inCountry FR . ?a lat ?lat . ?a long ?long . } (l1 ⊕ l2 ⊕ l3) ⊗ (l4 ⊕ l5) ⊗ ( l6 ⊕ l7) ⊗ (l8 ⊕ l9)
  • 13. Example Polynomial select ?l ?long ?lat where { ?p name ``Krebs, Emil'' . ?p deathPlace ?l . ?c [] ?l . ?c featureClass P . ?c inCountry DE . ?c long ?long . ?c lat ?lat . } [(l1 ⊕ l2 ⊕ l3) ⊗ (l4 ⊕ l5)] ⊗ [( l6 ⊕ l7) ⊗ (l8) ⊗ (l9 ⊕ l10) ⊗ (l11 ⊕ l12) ⊗ (l13)]
  • 14. Granularity Levels ➢ source-level: sources of a triples ➢ triple-level: all pieces of data used to answer the query (l1 ⊕ l2) ⊗ (l3 ⊕ l4)
  • 16. Native Data Model ➢ Semantically co-located data ➢ Template based molecules
  • 17. Various Physical Storage Models Differences: ➢ ease of implementation ➢ memory consumption ➢ query execution ➢ interference with the original concept of molecule 1) SPOL 2) LSPO 3) SLPO 4) SPLO
  • 18. Annotated Triples ➢ Annotated provenance ➢ Quadruples ➢ Easy to implement ➢ Source data repeated for each triple
  • 19. Co-located Elements ➢ Data grouped by source ➢ Physically co-located ➢ Avoids duplication of the same source inside a molecule ➢ Data about a given subject co-located in one molecule ➢ More difficult to implement
  • 20. Experiments How expensive it is to trace provenance? What is the overhead on query execution time?
  • 21. Datasets ➢ Two collections of RDF data gathered from the Web ○ Billion Triple Challenge (BTC): Crawled from the linked open data cloud ○ Web Data Commons (WDC): RDFa, Microdata extracted from common crawl ➢ Typical collections gathered from multiple sources ➢ sampled subsets of ~110 million triples each; ~25GB each
  • 22. Workloads ➢ 8 Queries defined for BTC ○ T. Neumann and G. Weikum. Scalable join processing on very large rdf graphs. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 627–640. ACM, 2009. ➢ Two additional queries with UNION and OPTIONAL clauses ➢ 7 various new queries for WDC http://exascale.info/tripleprov
  • 23. Results Overhead of tracking provenance compared to vanilla version of the system for BTC dataset source-level co-located source-level annotated triple-level co-located triple-level annotated
  • 24. Conclusions ➢ provenance overhead is considerable but acceptable, on average about 60-70% ➢ most suitable storage model depends upon data and workloads characteristics ➢ annotated: more appropriate for heterogenous datasets and workloads retrieving provenance ➢ co-located: more appropriate for homogenous datasets and workload filtering by source
  • 25. Future Work ➢ Distributed version ➢ Dynamic storage model ➢ Adaptive query execution strategies ➢ PROV output ➢ Over provenance queries
  • 26. Summary ➢ TripleProv: an efficient triplestore tracking provenance ➢ Two storage models ➢ Fine-grained multilevel provenance tracing ➢ Formal provenance polynomials ➢ Experimental evaluation http://exascale.info/tripleprov
  • 27. Loading & Memory Billion Triple Challenge Web Data Commons
  • 28. Results Overhead of tracking provenance compared to vanilla version of the system for WDC dataset source-level SLPO source-level SPOL triple-level SLPO triple-level SPOL
  • 29. Polynomials: multiple records [(l1 ⊕ l2 ⊕ l3) ⊗ (l4 ⊕ l5) ⊗ ( l6 ⊕ l7) ⊗ (l8 ⊕ l9)] ⊕ [(l5 ⊕ l7) ⊗ (l4) ⊗ ( l13 ⊕ l17) ⊗ (l28)] ⊕ [(l4) ⊗ (l1 ⊕ l2) ⊗ ( l3 ⊕ l7) ⊗ (l8 ⊕ l9⊕ l4)]