During empirical evaluations of query processing techniques, metrics like execution time, time for the first answer, and throughput are usually reported. Albeit informative, these metrics are unable to quantify and evaluate the efficiency of a query engine over a certain time period – or diefficiency –, thus hampering the distinction of cutting- edge engines able to exhibit high-performance gradually. We tackle this issue and devise two experimental metrics named dief@t and dief@k, which allow for measuring the diefficiency during an elapsed time period t or while k answers are produced, respectively. The dief@t and dief@k measurement methods rely on the computation of the area under the curve of answer traces, and thus capturing the answer concentration over a time interval. We report experimental results of evaluating the behavior of a generic SPARQL query engine using both metrics. Observed results suggest that dief@t and dief@k are able to measure the performance of SPARQL query engines based on both the amount of answers produced by an engine and the time required to generate these answers.
2. Motivation (1)
SELECT ?d1 WHERE {
?d1 dcterms:subject dbc:Alcohols .
?d1 dbp:smiles ?s .}
Retrieve resources classified as DBpedia that have SMILES identifiers.
Query:
Query Engine
2
Answer Time
{?d1 à dbr:Zuclopenthixol} 0.37 sec.
{?d1 à dbr:Ziprepol} 0.37 sec.
{?d1 à dbr:Viminol} 0.37 sec.
{?d1 à dbr:Trifluperidol} 0.37 sec.
{?d1 à dbr:Trabectedin} 0.37 sec.
{?d1 à dbr:Tolvaptan} 0.37 sec.
Blocking Approach:
Produces all results at the end of execution.
Input
Output
3. Motivation (1)
SELECT ?d1 WHERE {
?d1 dcterms:subject dbc:Alcohols .
?d1 dbp:smiles ?s .}
Retrieve resources classified as DBpedia that have SMILES identifiers.
Query:
3
Answer Time
{?d1 à dbr:Zuclopenthixol} 0.33 sec.
{?d1 à dbr:Ziprepol} 0.35 sec.
{?d1 à dbr:Viminol} 0.35 sec.
{?d1 à dbr:Trifluperidol} 0.36 sec.
{?d1 à dbr:Trabectedin} 0.36 sec.
{?d1 à dbr:Tolvaptan} 0.37 sec.
Incremental Approach:
Produces results as soon as they are ready, e.g., ANAPSID, nLDE, TPF Client.
Query Engine
Input
Output
4. Motivation (2)
4
Metrics
nLDE Not
Adaptive
nLDE
Selective
nLDE
Random
Time First
Answer (sec.) 0.37 0.24 0.33
Execution Time
(sec.) 10.59 12.10 9.30
Throughput
(answer/sec.) 486.27 421.87 553.66
Completeness 100% 100% 100%
Query Engine
0
1000
2000
3000
4000
5000
0.0 2.5 5.0 7.5 10.0 12.5
Time (sec.)
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
Continuous PerformanceTraditional Metrics
Overall, nLDE Random outperforms the
other approaches.
nLDE Not Adaptive outperforms the other
approaches in the first 7.5 sec. of execution.
7. Current Performance Metrics
Effectiveness Efficiency
Combined Metric [Guo05]
Answer Completeness
[Guo05] [Montoya12]
Correctness [Zhang12]
Answer Soundness [Guo05]
Execution Time [Guo05] [Bizer09]
[Montoya12] [Zhang12]
Loading Time [Guo05]
Throughput [Zhang12]
Time for the First Tuple
[Acosta11]
Queries per Second [Bizer09]
Average Slowdown [Sharaf08]
These metrics do not consider continuous performance;
they are not tailored to benchmark incremental approaches.
7
9. Diefficiency Metrics
• Diefficiency: continuous efficiency.
• Combination of the Greek prefix di(a)- (which means “through” or
“across”) and efficiency.
• Continuous performance of approaches is recorded in answer traces.
• Our metrics quantify the diefficiency of incremental approaches.
9
Answer Time
{?d1 à dbr:Zuclopenthixol} 0.33
{?d1 à dbr:Ziprepol} 0.35
{?d1 à dbr:Viminol} 0.35
{?d1 à dbr:Trifluperidol} 0.36
{?d1 à dbr:Tolvaptan} 0.37
10. Answer Distribution Function
• Defined as 𝑋: 0; 𝑡& → ℕ.
• 𝑡& is the point in time when the last answer was produced.
• 𝑋 𝑥 indicates the number of answers produced until the time 𝑥.
• 𝑋 is built from answer traces (applying linear interpolations).
0
1000
2000
3000
4000
5000
0.0 2.5 5.0 7.5 10.0 12.5
Time
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
Q9.sparqlAnswer Distribution FunctionAnswer Trace
Answer Time
{?d1 à dbr:Zuclopenthixol} 0.33
{?d1 à dbr:Ziprepol} 0.35
{?d1 à dbr:Viminol} 0.35
{?d1 à dbr:Trifluperidol} 0.36
{?d1 à dbr:Tolvaptan} 0.37
… 10
11. Metric dief@t
• Quantifies diefficiency during the first t time units of execution.
• Measures the area under the curve in the interval [0; 𝑡] of 𝑋 𝑥 .
0
1000
2000
3000
4000
5000
0.0 2.5 5.0 7.5 10.0 12.5
Time (sec.)
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
dief @t := X(x)dx
0
t
∫
dief@t interpretation: Higher is better.
11
Not Adaptive Selective Random
7323.46 1148.63 5031.90
12. k = 2000
0
1000
2000
3000
4000
5000
0.0 2.5 5.0 7.5 10.0 12.5
Time (sec.)
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
Metric dief@k
• Quantifies diefficiency while producing the first k answers.
• Measures the area under the curve of the interval 0; 𝑡𝑘 of 𝑋 𝑥 .
• 𝑡𝑘 is the point in time where the kth answer is produced.
dief@k interpretation: Lower is better.
dief @k := X(x)dx
0
tk
∫
12
Not Adaptive Selective Random
4686.11 3235.67 3517.85
13. Measuring diefficiency at any time interval
• With dief@t it is possible to measure the diefficiency of an approach during
the interval 𝑡-; 𝑡/ , as follows:
𝑑𝑖𝑒𝑓@𝑡/ − 𝑑𝑖𝑒𝑓@𝑡-
Extensions of dief@t and dief@k
13
0
1000
2000
3000
4000
5000
0.0 2.5 5.0 7.5 10.0 12.5
Time
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
Not Adaptive Selective Random
5073.37 869.18 4024.21
14. Extensions of dief@t and dief@k
Measuring diefficiency between the ka
th and kb
th answers
• With dief@k it is possible to measure the diefficiency of an approach while
producing the answers 𝑘- and 𝑘/ (with 𝑘- ≤ 𝑘/), as follows:
𝑑𝑖𝑒𝑓@𝑘/ − 𝑑𝑖𝑒𝑓@𝑘-
14
0
1000
2000
3000
4000
5000
0.0 2.5 5.0 7.5 10.0 12.5
Time
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
Not Adaptive Selective Random
5847.05 5457.67 3468.71
15. Properties of dief@t and dief@k
Analytical Relationship Between dief@t and dief@k
Let 𝑡9 be the point in time when the 𝑘th answer is produced.
Theorem 1:
The diefficiency of blocking approaches is always zero.
Theorem 2:
In queries where the number of answers is greater than one, the total
diefficiency of incremental approaches is higher than zero.
15
𝑑𝑖𝑒𝑓@𝑡9 = 𝑑𝑖𝑒𝑓@𝑘
17. Experimental Settings
• Query engine: nLDE [Acosta15] with three configurations:
• nLDE Not Adaptive (NA)
• nLDE Selective (Sel)
• nLDE Random (Ran)
• Queries and dataset:
• nLDE Benchmark 1: 16 non-selective queries (4 –14 triple patterns)
• DBpedia dataset (v. 2015)
• Technical specifications: Debian Wheezy 64 bit with CPU 2x Intel(R)
Xeon(R) CPU E5-2670 2.60GHz (16 physical cores), and 256GB RAM.
17
18. 0
5000
10000
0 20 40 60
Time
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
Q17.sparql
(TFFF)^−1
(ET)^−1
Comp T
dief@t
NA
Ran
Sel
Comparing dief@t with Other Metrics (1)
18
(Time for the First Tuple)-1
Completeness Throughput
Plot interpretation: Higher is better.
Results for Query Q17
Uncovered pattern:
Ran outperforms NA
(Execution Time)-1
20. 5
10
15
0.6 0.9 1.2 1.5
Time
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
Q2.sparql
k=25%
k=50%
k=75%
k=100%
NA
Ran
Sel
Measuring Answer Rate with dief@k (1)
20
Plot interpretation: Lower is better.
Sel produces the
first 25% slower
than Ran
Sel produces the last
portions of the answer
at a faster rate
Results for Query Q2
23. Conclusions
𝒅𝒊𝒆𝒇@𝒕 and 𝒅𝒊𝒆𝒇@𝒌: Measure the diefficiency of incremental approaches.
• We have demonstrated the theoretical soundness of the metrics.
• Our empirical study indicates that 𝑑𝑖𝑒𝑓@𝑡 and 𝑑𝑖𝑒𝑓@𝑘 allow for
uncovering performance particularities.
• A final remark:
23
𝒅𝒊𝒆𝒇@𝒕 and 𝒅𝒊𝒆𝒇@𝒌 can measure the
performance of any incremental approach.
✔ Streaming query processing ✔Top-k ✔ Monotonic reasoning ✔ Crowdsourcing
24. • dief R package to compute 𝑑𝑖𝑒𝑓@𝑡 and 𝑑𝑖𝑒𝑓@𝑘
https://github.com/maribelacosta/dief
• Jupyter notebook:
• https://github.com/maribelacosta/dief-notebooks
• Online demo:
http://km.aifb.kit.edu/services/dief-app/
Available Resources
24
25. References
[Acosta15] M. Acosta and M.-E. Vidal. Networks of linked data eddies: An adaptive web query
processing engine for RDF data. In ISWC, pages 111–127, 2015.
[Acosta11] M. Acosta, M.-E. Vidal, J. Castillo, T. Lampo, and E. Ruckhaus. ANAPSID: An adaptive
query processing engine for SPARQL endpoints. In ISWC, pages 18–34, 2011.
[Bizer09] C. Bizer and A. Schultz. The Berlin SPARQL benchmark. Int. J. Semantic Web Inf. Syst.,
5(2):1–24, 2009.
[Guo05] Y. Guo, Z. Pan, and J. Heflin. LUBM: A benchmark for OWL knowledge base systems. Web
Semant., 3(2-3):158–182, Oct. 2005.
[Montoya12] G.Montoya, M.-E Vidal, Ó. Corcho, E. Ruckhaus, and C.B.Aranda.Benchmarking
federated SPARQL query engines: Are existing testbeds enough? In ISWC, pages 313–324, 2012.
[Sharaf08] M. A. Sharaf, P. K. Chrysanthis, A. Labrinidis, and K. Pruhs. Algorithms and metrics for
processing multiple heterogeneous continuous queries. ACM Trans. Database Syst., 33(1):5:1–5:44,
2008.
[Zhang12] Y. Zhang, M. Pham, Ó. Corcho, and J. Calbimonte. SRBench: A streaming RDF/SPARQL
benchmark. In ISWC, pages 641–657, 2012.
25
26. 0
1000
2000
3000
4000
5000
0.0 2.5 5.0 7.5 10.0
Time
#AnswersProduced
nLDE Not Adaptive
26
Diefficiency Metrics:
Measuring the Continuous Efficiency of Query Processing Approaches
Maribel Acosta, Maria-Esther Vidal, York Sure-Vetter
dief @t := X(x)dx
0
t
∫
dief @k := X(x)dx
0
tk
∫