SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
Diefficiency Metrics:
Measuring	the	Continuous	Efficiency	of	
Query	Processing	Approaches
Maribel Acosta, Maria-Esther Vidal, York Sure-Vetter
Presented at the International Semantic Web Conference 2017
Best Resource Paper Nominee
Motivation	(1)
SELECT ?d1 WHERE {
?d1 dcterms:subject dbc:Alcohols .
?d1 dbp:smiles ?s .}
Retrieve	resources	classified	as	DBpedia that	have	SMILES	identifiers.
Query:
Query Engine
2
Answer Time
{?d1 à dbr:Zuclopenthixol} 0.37 sec.
{?d1 à dbr:Ziprepol} 0.37 sec.
{?d1 à dbr:Viminol} 0.37 sec.
{?d1 à dbr:Trifluperidol} 0.37 sec.
{?d1 à dbr:Trabectedin} 0.37 sec.
{?d1 à dbr:Tolvaptan} 0.37 sec.
Blocking Approach:
Produces all results at the end of execution.
Input
Output
Motivation	(1)
SELECT ?d1 WHERE {
?d1 dcterms:subject dbc:Alcohols .
?d1 dbp:smiles ?s .}
Retrieve	resources	classified	as	DBpedia that	have	SMILES	identifiers.
Query:
3
Answer Time
{?d1 à dbr:Zuclopenthixol} 0.33 sec.
{?d1 à dbr:Ziprepol} 0.35 sec.
{?d1 à dbr:Viminol} 0.35 sec.
{?d1 à dbr:Trifluperidol} 0.36 sec.
{?d1 à dbr:Trabectedin} 0.36 sec.
{?d1 à dbr:Tolvaptan} 0.37 sec.
Incremental Approach:
Produces results as soon as they are ready, e.g., ANAPSID, nLDE, TPF Client.
Query Engine
Input
Output
Motivation	(2)
4
Metrics
nLDE Not
Adaptive
nLDE
Selective
nLDE
Random
Time First
Answer (sec.) 0.37 0.24 0.33
Execution Time
(sec.) 10.59 12.10 9.30
Throughput
(answer/sec.) 486.27 421.87 553.66
Completeness 100% 100% 100%
Query Engine
0
1000
2000
3000
4000
5000
0.0 2.5 5.0 7.5 10.0 12.5
Time (sec.)
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
Continuous PerformanceTraditional Metrics
Overall, nLDE Random outperforms the
other approaches.
nLDE Not Adaptive outperforms the other
approaches in the first 7.5 sec. of execution.
Motivation	(3)
We need quantitative methods to measure
the continuous efficiency of
query processing approaches.
5
Related	Work
6
Current	Performance	Metrics
Effectiveness Efficiency
Combined Metric [Guo05]
Answer Completeness
[Guo05] [Montoya12]
Correctness [Zhang12]
Answer Soundness [Guo05]
Execution Time [Guo05] [Bizer09]
[Montoya12] [Zhang12]
Loading Time [Guo05]
Throughput [Zhang12]
Time for the First Tuple
[Acosta11]
Queries per Second [Bizer09]
Average Slowdown [Sharaf08]
These metrics do not consider continuous performance;
they are not tailored to benchmark incremental approaches.
7
Our	Approach:	
Measuring	Continuous	Efficiency
8
Diefficiency Metrics
• Diefficiency: continuous efficiency.
• Combination of the Greek prefix di(a)- (which means “through” or
“across”) and efficiency.
• Continuous performance of approaches is recorded in answer traces.
• Our metrics quantify the diefficiency of incremental approaches.
9
Answer Time
{?d1 à dbr:Zuclopenthixol} 0.33
{?d1 à dbr:Ziprepol} 0.35
{?d1 à dbr:Viminol} 0.35
{?d1 à dbr:Trifluperidol} 0.36
{?d1 à dbr:Tolvaptan} 0.37
Answer	Distribution	Function	
• Defined as 𝑋: 0; 𝑡& → ℕ.
• 𝑡& is the point in time when the last answer was produced.
• 𝑋 𝑥 indicates the number of answers produced until the time 𝑥.
• 𝑋 is built from answer traces (applying linear interpolations).
0
1000
2000
3000
4000
5000
0.0 2.5 5.0 7.5 10.0 12.5
Time
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
Q9.sparqlAnswer Distribution FunctionAnswer Trace
Answer Time
{?d1 à dbr:Zuclopenthixol} 0.33
{?d1 à dbr:Ziprepol} 0.35
{?d1 à dbr:Viminol} 0.35
{?d1 à dbr:Trifluperidol} 0.36
{?d1 à dbr:Tolvaptan} 0.37
… 10
Metric	dief@t
• Quantifies diefficiency during the first t time units of execution.
• Measures the area under the curve in the interval [0; 𝑡] of 𝑋 𝑥 .
0
1000
2000
3000
4000
5000
0.0 2.5 5.0 7.5 10.0 12.5
Time (sec.)
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
dief @t := X(x)dx
0
t
∫
dief@t interpretation: Higher is better.
11
Not Adaptive Selective Random
7323.46 1148.63 5031.90
k = 2000
0
1000
2000
3000
4000
5000
0.0 2.5 5.0 7.5 10.0 12.5
Time (sec.)
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
Metric	dief@k
• Quantifies diefficiency while producing the first k answers.
• Measures the area under the curve of the interval 0; 𝑡𝑘 of 𝑋 𝑥 .
• 𝑡𝑘 is the point in time where the kth answer is produced.
dief@k interpretation: Lower is better.
dief @k := X(x)dx
0
tk
∫
12
Not Adaptive Selective Random
4686.11 3235.67 3517.85
Measuring diefficiency at any time interval
• With dief@t it is possible to measure the diefficiency of an approach during
the interval 𝑡-;	 𝑡/ , as follows:
𝑑𝑖𝑒𝑓@𝑡/ − 𝑑𝑖𝑒𝑓@𝑡-
Extensions	of	dief@t and	dief@k
13
0
1000
2000
3000
4000
5000
0.0 2.5 5.0 7.5 10.0 12.5
Time
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
Not Adaptive Selective Random
5073.37 869.18 4024.21
Extensions	of	dief@t and	dief@k
Measuring diefficiency between the ka
th and kb
th answers
• With dief@k it is possible to measure the diefficiency of an approach while
producing the answers 𝑘- and 𝑘/ (with 𝑘- ≤ 𝑘/), as follows:
𝑑𝑖𝑒𝑓@𝑘/ − 𝑑𝑖𝑒𝑓@𝑘-
14
0
1000
2000
3000
4000
5000
0.0 2.5 5.0 7.5 10.0 12.5
Time
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
Not Adaptive Selective Random
5847.05 5457.67 3468.71
Properties	of	dief@t and	dief@k
Analytical Relationship Between dief@t and dief@k
Let 𝑡9 be the point in time when the 𝑘th answer is produced.
Theorem 1:
The diefficiency of blocking approaches is always zero.
Theorem 2:
In queries where the number of answers is greater than one, the total
diefficiency of incremental approaches is higher than zero.
15
𝑑𝑖𝑒𝑓@𝑡9 = 𝑑𝑖𝑒𝑓@𝑘
Empirical	Study
16
Experimental	Settings
• Query engine: nLDE [Acosta15] with three configurations:
• nLDE Not Adaptive (NA)
• nLDE Selective (Sel)
• nLDE Random (Ran)
• Queries and dataset:
• nLDE Benchmark 1: 16 non-selective queries (4 –14 triple patterns)
• DBpedia dataset (v. 2015)
• Technical specifications: Debian Wheezy 64 bit with CPU 2x Intel(R)
Xeon(R) CPU E5-2670 2.60GHz (16 physical cores), and 256GB RAM.
17
0
5000
10000
0 20 40 60
Time
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
Q17.sparql
(TFFF)^−1
(ET)^−1
Comp T
dief@t
NA
Ran
Sel
Comparing	dief@t with	Other	Metrics	(1)
18
(Time for the First Tuple)-1
Completeness Throughput
Plot interpretation: Higher is better.
Results for Query Q17
Uncovered pattern:
Ran outperforms NA
(Execution Time)-1
Comparing	dief@t with	Other	Metrics	(2)
19
Queries in which 𝒅𝒊𝒆𝒇@𝒕	uncovers unknown patterns
5
10
15
0.6 0.9 1.2 1.5
Time
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
Q2.sparql
k=25%
k=50%
k=75%
k=100%
NA
Ran
Sel
Measuring	Answer	Rate	with	dief@k (1)
20
Plot interpretation: Lower is better.
Sel produces the
first 25% slower
than Ran
Sel produces the last
portions of the answer
at a faster rate
Results for Query Q2
Measuring	Answer	Rate	with	dief@k (2)
21
Only in these queries, all the
approaches produced results
at a uniform rate.
Conclusions	&	Outlook
22
Conclusions
𝒅𝒊𝒆𝒇@𝒕 and 𝒅𝒊𝒆𝒇@𝒌: Measure the diefficiency of incremental approaches.
• We have demonstrated the theoretical soundness of the metrics.
• Our empirical study indicates that 𝑑𝑖𝑒𝑓@𝑡 and 𝑑𝑖𝑒𝑓@𝑘	allow for
uncovering performance particularities.
• A final remark:
23
𝒅𝒊𝒆𝒇@𝒕	 and 𝒅𝒊𝒆𝒇@𝒌	can measure the
performance of any incremental approach.
✔ Streaming query processing ✔Top-k ✔ Monotonic reasoning ✔ Crowdsourcing
• dief R package to compute 𝑑𝑖𝑒𝑓@𝑡 and 𝑑𝑖𝑒𝑓@𝑘
https://github.com/maribelacosta/dief
• Jupyter notebook:
• https://github.com/maribelacosta/dief-notebooks
• Online demo:
http://km.aifb.kit.edu/services/dief-app/
Available	Resources	
24
References
[Acosta15] M. Acosta and M.-E. Vidal. Networks of linked data eddies: An adaptive web query
processing engine for RDF data. In ISWC, pages 111–127, 2015.
[Acosta11] M. Acosta, M.-E. Vidal, J. Castillo, T. Lampo, and E. Ruckhaus. ANAPSID: An adaptive
query processing engine for SPARQL endpoints. In ISWC, pages 18–34, 2011.
[Bizer09] C. Bizer and A. Schultz. The Berlin SPARQL benchmark. Int. J. Semantic Web Inf. Syst.,
5(2):1–24, 2009.
[Guo05] Y. Guo, Z. Pan, and J. Heflin. LUBM: A benchmark for OWL knowledge base systems. Web
Semant., 3(2-3):158–182, Oct. 2005.
[Montoya12] G.Montoya, M.-E Vidal, Ó. Corcho, E. Ruckhaus, and C.B.Aranda.Benchmarking
federated SPARQL query engines: Are existing testbeds enough? In ISWC, pages 313–324, 2012.
[Sharaf08] M. A. Sharaf, P. K. Chrysanthis, A. Labrinidis, and K. Pruhs. Algorithms and metrics for
processing multiple heterogeneous continuous queries. ACM Trans. Database Syst., 33(1):5:1–5:44,
2008.
[Zhang12] Y. Zhang, M. Pham, Ó. Corcho, and J. Calbimonte. SRBench: A streaming RDF/SPARQL
benchmark. In ISWC, pages 641–657, 2012.
25
0
1000
2000
3000
4000
5000
0.0 2.5 5.0 7.5 10.0
Time
#AnswersProduced
nLDE Not Adaptive
26
Diefficiency Metrics:	
Measuring	the	Continuous	Efficiency	of	Query	Processing	Approaches	
Maribel Acosta, Maria-Esther Vidal, York Sure-Vetter
dief @t := X(x)dx
0
t
∫
dief @k := X(x)dx
0
tk
∫

Contenu connexe

Similaire à Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing Approaches

AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
 
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...William Yetman
 
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...MLconf
 
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceMapR Technologies
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptxKtonNguyn2
 
Performance and Benchmarking
Performance and BenchmarkingPerformance and Benchmarking
Performance and BenchmarkingDavid Wengier
 
Optimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and CreativityOptimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and CreativityMongoDB
 
Data science with R - Clustering and Classification
Data science with R - Clustering and ClassificationData science with R - Clustering and Classification
Data science with R - Clustering and ClassificationBrigitte Mueller
 
Tailored source-code-transformation-synthesize-computationally-diverse-progra...
Tailored source-code-transformation-synthesize-computationally-diverse-progra...Tailored source-code-transformation-synthesize-computationally-diverse-progra...
Tailored source-code-transformation-synthesize-computationally-diverse-progra...FoCAS Initiative
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)University of Washington
 
Testing: ¿what, how, why?
Testing: ¿what, how, why?Testing: ¿what, how, why?
Testing: ¿what, how, why?David Rodenas
 
Why computer programming
Why computer programmingWhy computer programming
Why computer programmingTUOS-Sam
 
Development of Multi-Level ROM
Development of Multi-Level ROMDevelopment of Multi-Level ROM
Development of Multi-Level ROMMohammad
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsPaolo Missier
 
Life Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
Life Cycle of Metrics, Alerting, and Performance Monitoring in MicroservicesLife Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
Life Cycle of Metrics, Alerting, and Performance Monitoring in MicroservicesSean Chittenden
 

Similaire à Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing Approaches (20)

AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
 
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
 
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop Performance
 
New directions for mahout
New directions for mahoutNew directions for mahout
New directions for mahout
 
Big Data Challenges and Solutions
Big Data Challenges and SolutionsBig Data Challenges and Solutions
Big Data Challenges and Solutions
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptx
 
Performance and Benchmarking
Performance and BenchmarkingPerformance and Benchmarking
Performance and Benchmarking
 
Optimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and CreativityOptimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and Creativity
 
Data science with R - Clustering and Classification
Data science with R - Clustering and ClassificationData science with R - Clustering and Classification
Data science with R - Clustering and Classification
 
Matlab
Matlab Matlab
Matlab
 
Tailored source-code-transformation-synthesize-computationally-diverse-progra...
Tailored source-code-transformation-synthesize-computationally-diverse-progra...Tailored source-code-transformation-synthesize-computationally-diverse-progra...
Tailored source-code-transformation-synthesize-computationally-diverse-progra...
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
Testing: ¿what, how, why?
Testing: ¿what, how, why?Testing: ¿what, how, why?
Testing: ¿what, how, why?
 
Why computer programming
Why computer programmingWhy computer programming
Why computer programming
 
ANSSummer2015
ANSSummer2015ANSSummer2015
ANSSummer2015
 
Development of Multi-Level ROM
Development of Multi-Level ROMDevelopment of Multi-Level ROM
Development of Multi-Level ROM
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
 
Life Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
Life Cycle of Metrics, Alerting, and Performance Monitoring in MicroservicesLife Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
Life Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
 

Plus de Maribel Acosta Deibe

A Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia MappingsA Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia MappingsMaribel Acosta Deibe
 
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs:A DBpedia StudyCrowdsourcing the Quality of Knowledge Graphs:A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia StudyMaribel Acosta Deibe
 
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...Maribel Acosta Deibe
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsMaribel Acosta Deibe
 
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via CrowdsourcingHARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via CrowdsourcingMaribel Acosta Deibe
 
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 TutorialSemantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 TutorialMaribel Acosta Deibe
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentMaribel Acosta Deibe
 
Semantic Data Management in Graph Databases
Semantic Data Management in Graph DatabasesSemantic Data Management in Graph Databases
Semantic Data Management in Graph DatabasesMaribel Acosta Deibe
 

Plus de Maribel Acosta Deibe (8)

A Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia MappingsA Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia Mappings
 
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs:A DBpedia StudyCrowdsourcing the Quality of Knowledge Graphs:A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
 
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
 
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via CrowdsourcingHARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
 
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 TutorialSemantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
 
Semantic Data Management in Graph Databases
Semantic Data Management in Graph DatabasesSemantic Data Management in Graph Databases
Semantic Data Management in Graph Databases
 

Dernier

Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsNurulAfiqah307317
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 

Dernier (20)

Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 

Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing Approaches

  • 1. Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing Approaches Maribel Acosta, Maria-Esther Vidal, York Sure-Vetter Presented at the International Semantic Web Conference 2017 Best Resource Paper Nominee
  • 2. Motivation (1) SELECT ?d1 WHERE { ?d1 dcterms:subject dbc:Alcohols . ?d1 dbp:smiles ?s .} Retrieve resources classified as DBpedia that have SMILES identifiers. Query: Query Engine 2 Answer Time {?d1 à dbr:Zuclopenthixol} 0.37 sec. {?d1 à dbr:Ziprepol} 0.37 sec. {?d1 à dbr:Viminol} 0.37 sec. {?d1 à dbr:Trifluperidol} 0.37 sec. {?d1 à dbr:Trabectedin} 0.37 sec. {?d1 à dbr:Tolvaptan} 0.37 sec. Blocking Approach: Produces all results at the end of execution. Input Output
  • 3. Motivation (1) SELECT ?d1 WHERE { ?d1 dcterms:subject dbc:Alcohols . ?d1 dbp:smiles ?s .} Retrieve resources classified as DBpedia that have SMILES identifiers. Query: 3 Answer Time {?d1 à dbr:Zuclopenthixol} 0.33 sec. {?d1 à dbr:Ziprepol} 0.35 sec. {?d1 à dbr:Viminol} 0.35 sec. {?d1 à dbr:Trifluperidol} 0.36 sec. {?d1 à dbr:Trabectedin} 0.36 sec. {?d1 à dbr:Tolvaptan} 0.37 sec. Incremental Approach: Produces results as soon as they are ready, e.g., ANAPSID, nLDE, TPF Client. Query Engine Input Output
  • 4. Motivation (2) 4 Metrics nLDE Not Adaptive nLDE Selective nLDE Random Time First Answer (sec.) 0.37 0.24 0.33 Execution Time (sec.) 10.59 12.10 9.30 Throughput (answer/sec.) 486.27 421.87 553.66 Completeness 100% 100% 100% Query Engine 0 1000 2000 3000 4000 5000 0.0 2.5 5.0 7.5 10.0 12.5 Time (sec.) #AnswersProduced nLDE Not Adaptive nLDE Selective nLDE Random Continuous PerformanceTraditional Metrics Overall, nLDE Random outperforms the other approaches. nLDE Not Adaptive outperforms the other approaches in the first 7.5 sec. of execution.
  • 5. Motivation (3) We need quantitative methods to measure the continuous efficiency of query processing approaches. 5
  • 7. Current Performance Metrics Effectiveness Efficiency Combined Metric [Guo05] Answer Completeness [Guo05] [Montoya12] Correctness [Zhang12] Answer Soundness [Guo05] Execution Time [Guo05] [Bizer09] [Montoya12] [Zhang12] Loading Time [Guo05] Throughput [Zhang12] Time for the First Tuple [Acosta11] Queries per Second [Bizer09] Average Slowdown [Sharaf08] These metrics do not consider continuous performance; they are not tailored to benchmark incremental approaches. 7
  • 9. Diefficiency Metrics • Diefficiency: continuous efficiency. • Combination of the Greek prefix di(a)- (which means “through” or “across”) and efficiency. • Continuous performance of approaches is recorded in answer traces. • Our metrics quantify the diefficiency of incremental approaches. 9 Answer Time {?d1 à dbr:Zuclopenthixol} 0.33 {?d1 à dbr:Ziprepol} 0.35 {?d1 à dbr:Viminol} 0.35 {?d1 à dbr:Trifluperidol} 0.36 {?d1 à dbr:Tolvaptan} 0.37
  • 10. Answer Distribution Function • Defined as 𝑋: 0; 𝑡& → ℕ. • 𝑡& is the point in time when the last answer was produced. • 𝑋 𝑥 indicates the number of answers produced until the time 𝑥. • 𝑋 is built from answer traces (applying linear interpolations). 0 1000 2000 3000 4000 5000 0.0 2.5 5.0 7.5 10.0 12.5 Time #AnswersProduced nLDE Not Adaptive nLDE Selective nLDE Random Q9.sparqlAnswer Distribution FunctionAnswer Trace Answer Time {?d1 à dbr:Zuclopenthixol} 0.33 {?d1 à dbr:Ziprepol} 0.35 {?d1 à dbr:Viminol} 0.35 {?d1 à dbr:Trifluperidol} 0.36 {?d1 à dbr:Tolvaptan} 0.37 … 10
  • 11. Metric dief@t • Quantifies diefficiency during the first t time units of execution. • Measures the area under the curve in the interval [0; 𝑡] of 𝑋 𝑥 . 0 1000 2000 3000 4000 5000 0.0 2.5 5.0 7.5 10.0 12.5 Time (sec.) #AnswersProduced nLDE Not Adaptive nLDE Selective nLDE Random dief @t := X(x)dx 0 t ∫ dief@t interpretation: Higher is better. 11 Not Adaptive Selective Random 7323.46 1148.63 5031.90
  • 12. k = 2000 0 1000 2000 3000 4000 5000 0.0 2.5 5.0 7.5 10.0 12.5 Time (sec.) #AnswersProduced nLDE Not Adaptive nLDE Selective nLDE Random Metric dief@k • Quantifies diefficiency while producing the first k answers. • Measures the area under the curve of the interval 0; 𝑡𝑘 of 𝑋 𝑥 . • 𝑡𝑘 is the point in time where the kth answer is produced. dief@k interpretation: Lower is better. dief @k := X(x)dx 0 tk ∫ 12 Not Adaptive Selective Random 4686.11 3235.67 3517.85
  • 13. Measuring diefficiency at any time interval • With dief@t it is possible to measure the diefficiency of an approach during the interval 𝑡-; 𝑡/ , as follows: 𝑑𝑖𝑒𝑓@𝑡/ − 𝑑𝑖𝑒𝑓@𝑡- Extensions of dief@t and dief@k 13 0 1000 2000 3000 4000 5000 0.0 2.5 5.0 7.5 10.0 12.5 Time #AnswersProduced nLDE Not Adaptive nLDE Selective nLDE Random Not Adaptive Selective Random 5073.37 869.18 4024.21
  • 14. Extensions of dief@t and dief@k Measuring diefficiency between the ka th and kb th answers • With dief@k it is possible to measure the diefficiency of an approach while producing the answers 𝑘- and 𝑘/ (with 𝑘- ≤ 𝑘/), as follows: 𝑑𝑖𝑒𝑓@𝑘/ − 𝑑𝑖𝑒𝑓@𝑘- 14 0 1000 2000 3000 4000 5000 0.0 2.5 5.0 7.5 10.0 12.5 Time #AnswersProduced nLDE Not Adaptive nLDE Selective nLDE Random Not Adaptive Selective Random 5847.05 5457.67 3468.71
  • 15. Properties of dief@t and dief@k Analytical Relationship Between dief@t and dief@k Let 𝑡9 be the point in time when the 𝑘th answer is produced. Theorem 1: The diefficiency of blocking approaches is always zero. Theorem 2: In queries where the number of answers is greater than one, the total diefficiency of incremental approaches is higher than zero. 15 𝑑𝑖𝑒𝑓@𝑡9 = 𝑑𝑖𝑒𝑓@𝑘
  • 17. Experimental Settings • Query engine: nLDE [Acosta15] with three configurations: • nLDE Not Adaptive (NA) • nLDE Selective (Sel) • nLDE Random (Ran) • Queries and dataset: • nLDE Benchmark 1: 16 non-selective queries (4 –14 triple patterns) • DBpedia dataset (v. 2015) • Technical specifications: Debian Wheezy 64 bit with CPU 2x Intel(R) Xeon(R) CPU E5-2670 2.60GHz (16 physical cores), and 256GB RAM. 17
  • 18. 0 5000 10000 0 20 40 60 Time #AnswersProduced nLDE Not Adaptive nLDE Selective nLDE Random Q17.sparql (TFFF)^−1 (ET)^−1 Comp T dief@t NA Ran Sel Comparing dief@t with Other Metrics (1) 18 (Time for the First Tuple)-1 Completeness Throughput Plot interpretation: Higher is better. Results for Query Q17 Uncovered pattern: Ran outperforms NA (Execution Time)-1
  • 19. Comparing dief@t with Other Metrics (2) 19 Queries in which 𝒅𝒊𝒆𝒇@𝒕 uncovers unknown patterns
  • 20. 5 10 15 0.6 0.9 1.2 1.5 Time #AnswersProduced nLDE Not Adaptive nLDE Selective nLDE Random Q2.sparql k=25% k=50% k=75% k=100% NA Ran Sel Measuring Answer Rate with dief@k (1) 20 Plot interpretation: Lower is better. Sel produces the first 25% slower than Ran Sel produces the last portions of the answer at a faster rate Results for Query Q2
  • 21. Measuring Answer Rate with dief@k (2) 21 Only in these queries, all the approaches produced results at a uniform rate.
  • 23. Conclusions 𝒅𝒊𝒆𝒇@𝒕 and 𝒅𝒊𝒆𝒇@𝒌: Measure the diefficiency of incremental approaches. • We have demonstrated the theoretical soundness of the metrics. • Our empirical study indicates that 𝑑𝑖𝑒𝑓@𝑡 and 𝑑𝑖𝑒𝑓@𝑘 allow for uncovering performance particularities. • A final remark: 23 𝒅𝒊𝒆𝒇@𝒕 and 𝒅𝒊𝒆𝒇@𝒌 can measure the performance of any incremental approach. ✔ Streaming query processing ✔Top-k ✔ Monotonic reasoning ✔ Crowdsourcing
  • 24. • dief R package to compute 𝑑𝑖𝑒𝑓@𝑡 and 𝑑𝑖𝑒𝑓@𝑘 https://github.com/maribelacosta/dief • Jupyter notebook: • https://github.com/maribelacosta/dief-notebooks • Online demo: http://km.aifb.kit.edu/services/dief-app/ Available Resources 24
  • 25. References [Acosta15] M. Acosta and M.-E. Vidal. Networks of linked data eddies: An adaptive web query processing engine for RDF data. In ISWC, pages 111–127, 2015. [Acosta11] M. Acosta, M.-E. Vidal, J. Castillo, T. Lampo, and E. Ruckhaus. ANAPSID: An adaptive query processing engine for SPARQL endpoints. In ISWC, pages 18–34, 2011. [Bizer09] C. Bizer and A. Schultz. The Berlin SPARQL benchmark. Int. J. Semantic Web Inf. Syst., 5(2):1–24, 2009. [Guo05] Y. Guo, Z. Pan, and J. Heflin. LUBM: A benchmark for OWL knowledge base systems. Web Semant., 3(2-3):158–182, Oct. 2005. [Montoya12] G.Montoya, M.-E Vidal, Ó. Corcho, E. Ruckhaus, and C.B.Aranda.Benchmarking federated SPARQL query engines: Are existing testbeds enough? In ISWC, pages 313–324, 2012. [Sharaf08] M. A. Sharaf, P. K. Chrysanthis, A. Labrinidis, and K. Pruhs. Algorithms and metrics for processing multiple heterogeneous continuous queries. ACM Trans. Database Syst., 33(1):5:1–5:44, 2008. [Zhang12] Y. Zhang, M. Pham, Ó. Corcho, and J. Calbimonte. SRBench: A streaming RDF/SPARQL benchmark. In ISWC, pages 641–657, 2012. 25
  • 26. 0 1000 2000 3000 4000 5000 0.0 2.5 5.0 7.5 10.0 Time #AnswersProduced nLDE Not Adaptive 26 Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing Approaches Maribel Acosta, Maria-Esther Vidal, York Sure-Vetter dief @t := X(x)dx 0 t ∫ dief @k := X(x)dx 0 tk ∫