SlideShare une entreprise Scribd logo
1  sur  28
Muhammad Saleem , Claus Stadler, Qaiser Mehmood, Jens
Lehmann, Axel-Cyrille Ngonga Ngomo
(K-Cap 2017, Austin, USA)
AKSW, University of Leipzig, Germany
DICE, University of Paderborn,
Germany
SDA, University of Bonn, Germany
1
 Query containment
 Why SQCFramework?
 SQCFramework
 Input queries
 Important query features
 Benchmark generation
 Benchmark personalization
 Evaluation and results
 Conclusion
2
Deciding whether the result set of one query is
included in the result set of another?
3
Formally:
 Query optimization
 Caching mechanisms
 Data integration
 View maintenance
 Query rewriting
4
 Real data
 Real log queries
 Flexible
 Customizable
 Use-case specific
5
6
SPARQ
L
queries
Selection
criteria
Containment
benchmark
1. Selection of super-queries
2. Normalization of feature vectors
3. Generation of clusters
4. Selection of most representative queries
 Manually provided by user
 Selection from LSQ
 Linked SPARQL Queries datasets
 Extracted from endpoint queries log
 Structural and data-driven statistics
7
20 datasets available from (http://hobbitdata.informatik.uni-leipzig.de/lsq-dumps/)
 Number of entailments/sub-queries
 Number of projection variables
 Number of BGPs
 Number of triple patterns
 Max. number BGP triple patterns
 Min. number BGP triple patterns
 Number of join vertices
 Mean join vertex degree
 Number of LSQ features
8
1. Selection of super-queries
2. Normalized feature vectors
3. Generation of clusters
4. Selection of most representative queries
9
10
11
2
2
1
5
5
5
3
2.3
2
 Number of entailments/sub-queries
 Number of projection variables
 Number of BGPs
 Number of triple patterns
 Max. number BGP triple patterns
 Min. number BGP triple patterns
 Number of join vertices
 Mean join vertex degree
 Number of LSQ features
10
8
6
12
5
10
10
5
30
0.2
0.25
0.16
0.41
1
0.5
0.33
0.46
0.06
Feature vector Max. feature vector Normalized feature vector
F M F/M
 FEASIBLE
 FEASIBLE-Exemplars
 KMeans++
 DBSCAN+KMeans++
 Agglomerative
 Random selection
12
13
Plot normalized feature vectors in a multidimensional space
Query F1 F2
Q1 0.2 0.2
Q2 0.5 0.3
Q3 0.8 0.3
Q4 0.9 0.1
Q5 0.5 0.5
Q6 0.2 0.7
Q7 0.1 0.8
Q8 0.13 0.65
Q9 0.9 0.5
Q10 0.1 0.5
Suppose we need a benchmark of 3 queries
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
14
15
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Calculate Average across each cluster
16
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Calculate distance of each point in cluster to the average
17
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Select minimum distance query as the final benchmark
query from that cluster
Purple, i.e., Q2 is the final selected query from yellow cluster
 Number of projection variables in the super-
queries should be at most 2
 Number of BGPs should be greater than 1
or the number of triple patterns should be
greater than 3
 Benchmark should be selected from the
most recently executed 1000 queries
18
19
 Similarity error
 Diversity score
L is the query log, B is the benchmark,
and k is the set of all features
 We compared
 FEASIBLE
 FEASIBLE-Exemplars
 KMeans++
 DBSCAN+KMeans++
 Random selection
 Number of containment tests (#T)
 Benchmark generation time (G) in sec
20
 Query Mixes per Hour (QMpH)
 Number of handled test cases
 Number of timed out test cases
 We compared
 TreeSolver
 AFMU
 SPARQL-Algebra
 JSAC
We generated benchmarks using Semantic Web
Dog Food (SWDF) and DBpedia queries logs
21
0
0.01
0.02
0.03
0.04
0.05
15 25 50 75 100 125
SIMILARITYERROR
#SUPER QUERIES
FEASIBLE KMeans++
DBScan+KMeans++ Random
FEASIBLE-Exemplars
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
2 4 6 9 12 15
SIMILARITYERROR
#SUPER QUERIES
FEASIBLE KMeans++
DBScan+KMeans++ Random
FEASIBLE-Exemplars
(SWDF) (DBpedia)
• Similarity error is inversely (in general) proportional to benchmark size
• Random selection in general generates benchmarks of smaller similarity
errors
22
(SWDF) (DBpedia)
• Diversity score is inversely (in general) proportional to benchmark size
• FEASIBLE-Exemplars generates the more diverse benchmarks
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
15 25 50 75 100 125
DIVERSITYSCORE
#SUPER QUERIES
FEASIBLE KMeans++
DBScan+KMeans++ Random
FEASIBLE-Exemplars
0
0.1
0.2
0.3
0.4
0.5
2 4 6 9 12 15
DIVERSITYSCORE
#SUPER QUERIES
FEASIBLE KMeans++
DBScan+KMeans++ Random
FEASIBLE-Exemplars
23
• Not significant differences
24
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
NormalizedS.D.
SQCFrameWork-FEASIBLE-Exemplars
SQC-Benchmark
• SQCFrameWork-FEASIBLE-Exemplars is more diverse
across majority of the query features
*SQC-Benchmark: http://sparql-qc-bench.inrialpes.fr/
25
• JSAC correctly handled all cases in with reasonable
QMpH
0
0.5
1
1.5
2
QMpH
TreeSolver AFMU
JSAC SPARQL-Algebra
Total
Tests
#Handled
Tests
#Correct
Test
#Timeout
Tests
TreeSolver 1192 5 5 2
AFMU 1192 5 5 12
SPARQL-Algebra 1192 0 0 0
JSAC 1192 1192 1192 0
26
 SQCFramework:
 Based on real data, real log queries
 Flexible
 Customizable
 Use-case specific
 Similarity error is inversely (in general) proportional to benchmark size
 Random selection in general generates benchmarks of smaller similarity errors
 Diversity score is inversely (in general) proportional to benchmark size
 FEASIBLE-Exemplars generates the more diverse benchmarks
 JSAC correctly handled all cases in with reasonable QMpH
 SQCFramework available from (https://github.com/dice-group/sqcframework)
27
Thanks !
saleem@informatik.uni-leipzig.de
28

Contenu connexe

Similaire à SQCFramework: SPARQL Query containment Benchmark Generation Framework

Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchGreg Makowski
 
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)Dimitris Kontokostas
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016MLconf
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkSigOpt
 
Cheap Trick for Question Answering
Cheap Trick for Question AnsweringCheap Trick for Question Answering
Cheap Trick for Question AnsweringSujit Pal
 
Improving Requirements Glossary Construction via Clustering
Improving Requirements Glossary Construction via ClusteringImproving Requirements Glossary Construction via Clustering
Improving Requirements Glossary Construction via ClusteringLionel Briand
 
(DAT311) Large-Scale Genomic Analysis with Amazon Redshift
(DAT311) Large-Scale Genomic Analysis with Amazon Redshift(DAT311) Large-Scale Genomic Analysis with Amazon Redshift
(DAT311) Large-Scale Genomic Analysis with Amazon RedshiftAmazon Web Services
 
Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories Dániel Stein
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsScott Clark
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsSigOpt
 
Automated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsAutomated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsLionel Briand
 
Sampling-SDM2012_Jun
Sampling-SDM2012_JunSampling-SDM2012_Jun
Sampling-SDM2012_JunMDO_Lab
 
Crossing the Boundaries: Development Strategies for (P)SoCs
Crossing the Boundaries: Development Strategies for (P)SoCsCrossing the Boundaries: Development Strategies for (P)SoCs
Crossing the Boundaries: Development Strategies for (P)SoCsAndreas Koschak
 
Noha danms13 talk_final
Noha danms13 talk_finalNoha danms13 talk_final
Noha danms13 talk_finalNoha Elprince
 
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Lionel Briand
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Sung Kim
 

Similaire à SQCFramework: SPARQL Query containment Benchmark Generation Framework (20)

Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Mutant Tests Too: The SQL
Mutant Tests Too: The SQLMutant Tests Too: The SQL
Mutant Tests Too: The SQL
 
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
 
Srikanta Mishra
Srikanta MishraSrikanta Mishra
Srikanta Mishra
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
Cheap Trick for Question Answering
Cheap Trick for Question AnsweringCheap Trick for Question Answering
Cheap Trick for Question Answering
 
Improving Requirements Glossary Construction via Clustering
Improving Requirements Glossary Construction via ClusteringImproving Requirements Glossary Construction via Clustering
Improving Requirements Glossary Construction via Clustering
 
Tree building 2
Tree building 2Tree building 2
Tree building 2
 
(DAT311) Large-Scale Genomic Analysis with Amazon Redshift
(DAT311) Large-Scale Genomic Analysis with Amazon Redshift(DAT311) Large-Scale Genomic Analysis with Amazon Redshift
(DAT311) Large-Scale Genomic Analysis with Amazon Redshift
 
Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Automated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsAutomated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance Systems
 
Sampling-SDM2012_Jun
Sampling-SDM2012_JunSampling-SDM2012_Jun
Sampling-SDM2012_Jun
 
Crossing the Boundaries: Development Strategies for (P)SoCs
Crossing the Boundaries: Development Strategies for (P)SoCsCrossing the Boundaries: Development Strategies for (P)SoCs
Crossing the Boundaries: Development Strategies for (P)SoCs
 
Noha danms13 talk_final
Noha danms13 talk_finalNoha danms13 talk_final
Noha danms13 talk_final
 
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
 
Benchmarking_ML_Tools
Benchmarking_ML_ToolsBenchmarking_ML_Tools
Benchmarking_ML_Tools
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
 

Plus de Muhammad Saleem

QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...Muhammad Saleem
 
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...Muhammad Saleem
 
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationCostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationMuhammad Saleem
 
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Muhammad Saleem
 
Federated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedMuhammad Saleem
 
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsFine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsMuhammad Saleem
 
SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016Muhammad Saleem
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationMuhammad Saleem
 
LSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries DatasetLSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries DatasetMuhammad Saleem
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialMuhammad Saleem
 
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesSAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesMuhammad Saleem
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataMuhammad Saleem
 
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationHiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationMuhammad Saleem
 
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of DataDAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of DataMuhammad Saleem
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataMuhammad Saleem
 
Linked Cancer Genome Atlas Database
Linked Cancer Genome Atlas DatabaseLinked Cancer Genome Atlas Database
Linked Cancer Genome Atlas DatabaseMuhammad Saleem
 

Plus de Muhammad Saleem (18)

QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
 
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
 
LargeRDFBench
LargeRDFBenchLargeRDFBench
LargeRDFBench
 
Extended LargeRDFBench
Extended LargeRDFBenchExtended LargeRDFBench
Extended LargeRDFBench
 
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationCostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
 
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
 
Federated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFed
 
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsFine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
 
SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federation
 
LSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries DatasetLSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries Dataset
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 Tutorial
 
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesSAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of Data
 
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationHiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
 
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of DataDAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked Data
 
Linked Cancer Genome Atlas Database
Linked Cancer Genome Atlas DatabaseLinked Cancer Genome Atlas Database
Linked Cancer Genome Atlas Database
 

Dernier

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 

Dernier (20)

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 

SQCFramework: SPARQL Query containment Benchmark Generation Framework

  • 1. Muhammad Saleem , Claus Stadler, Qaiser Mehmood, Jens Lehmann, Axel-Cyrille Ngonga Ngomo (K-Cap 2017, Austin, USA) AKSW, University of Leipzig, Germany DICE, University of Paderborn, Germany SDA, University of Bonn, Germany 1
  • 2.  Query containment  Why SQCFramework?  SQCFramework  Input queries  Important query features  Benchmark generation  Benchmark personalization  Evaluation and results  Conclusion 2
  • 3. Deciding whether the result set of one query is included in the result set of another? 3 Formally:
  • 4.  Query optimization  Caching mechanisms  Data integration  View maintenance  Query rewriting 4
  • 5.  Real data  Real log queries  Flexible  Customizable  Use-case specific 5
  • 6. 6 SPARQ L queries Selection criteria Containment benchmark 1. Selection of super-queries 2. Normalization of feature vectors 3. Generation of clusters 4. Selection of most representative queries
  • 7.  Manually provided by user  Selection from LSQ  Linked SPARQL Queries datasets  Extracted from endpoint queries log  Structural and data-driven statistics 7 20 datasets available from (http://hobbitdata.informatik.uni-leipzig.de/lsq-dumps/)
  • 8.  Number of entailments/sub-queries  Number of projection variables  Number of BGPs  Number of triple patterns  Max. number BGP triple patterns  Min. number BGP triple patterns  Number of join vertices  Mean join vertex degree  Number of LSQ features 8
  • 9. 1. Selection of super-queries 2. Normalized feature vectors 3. Generation of clusters 4. Selection of most representative queries 9
  • 10. 10
  • 11. 11 2 2 1 5 5 5 3 2.3 2  Number of entailments/sub-queries  Number of projection variables  Number of BGPs  Number of triple patterns  Max. number BGP triple patterns  Min. number BGP triple patterns  Number of join vertices  Mean join vertex degree  Number of LSQ features 10 8 6 12 5 10 10 5 30 0.2 0.25 0.16 0.41 1 0.5 0.33 0.46 0.06 Feature vector Max. feature vector Normalized feature vector F M F/M
  • 12.  FEASIBLE  FEASIBLE-Exemplars  KMeans++  DBSCAN+KMeans++  Agglomerative  Random selection 12
  • 13. 13 Plot normalized feature vectors in a multidimensional space Query F1 F2 Q1 0.2 0.2 Q2 0.5 0.3 Q3 0.8 0.3 Q4 0.9 0.1 Q5 0.5 0.5 Q6 0.2 0.7 Q7 0.1 0.8 Q8 0.13 0.65 Q9 0.9 0.5 Q10 0.1 0.5 Suppose we need a benchmark of 3 queries Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
  • 15. 15 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. Avg. Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Calculate Average across each cluster
  • 16. 16 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. Avg. Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Calculate distance of each point in cluster to the average
  • 17. 17 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. Avg. Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Select minimum distance query as the final benchmark query from that cluster Purple, i.e., Q2 is the final selected query from yellow cluster
  • 18.  Number of projection variables in the super- queries should be at most 2  Number of BGPs should be greater than 1 or the number of triple patterns should be greater than 3  Benchmark should be selected from the most recently executed 1000 queries 18
  • 19. 19  Similarity error  Diversity score L is the query log, B is the benchmark, and k is the set of all features  We compared  FEASIBLE  FEASIBLE-Exemplars  KMeans++  DBSCAN+KMeans++  Random selection  Number of containment tests (#T)  Benchmark generation time (G) in sec
  • 20. 20  Query Mixes per Hour (QMpH)  Number of handled test cases  Number of timed out test cases  We compared  TreeSolver  AFMU  SPARQL-Algebra  JSAC We generated benchmarks using Semantic Web Dog Food (SWDF) and DBpedia queries logs
  • 21. 21 0 0.01 0.02 0.03 0.04 0.05 15 25 50 75 100 125 SIMILARITYERROR #SUPER QUERIES FEASIBLE KMeans++ DBScan+KMeans++ Random FEASIBLE-Exemplars 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 2 4 6 9 12 15 SIMILARITYERROR #SUPER QUERIES FEASIBLE KMeans++ DBScan+KMeans++ Random FEASIBLE-Exemplars (SWDF) (DBpedia) • Similarity error is inversely (in general) proportional to benchmark size • Random selection in general generates benchmarks of smaller similarity errors
  • 22. 22 (SWDF) (DBpedia) • Diversity score is inversely (in general) proportional to benchmark size • FEASIBLE-Exemplars generates the more diverse benchmarks 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 15 25 50 75 100 125 DIVERSITYSCORE #SUPER QUERIES FEASIBLE KMeans++ DBScan+KMeans++ Random FEASIBLE-Exemplars 0 0.1 0.2 0.3 0.4 0.5 2 4 6 9 12 15 DIVERSITYSCORE #SUPER QUERIES FEASIBLE KMeans++ DBScan+KMeans++ Random FEASIBLE-Exemplars
  • 23. 23 • Not significant differences
  • 24. 24 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 NormalizedS.D. SQCFrameWork-FEASIBLE-Exemplars SQC-Benchmark • SQCFrameWork-FEASIBLE-Exemplars is more diverse across majority of the query features *SQC-Benchmark: http://sparql-qc-bench.inrialpes.fr/
  • 25. 25 • JSAC correctly handled all cases in with reasonable QMpH 0 0.5 1 1.5 2 QMpH TreeSolver AFMU JSAC SPARQL-Algebra Total Tests #Handled Tests #Correct Test #Timeout Tests TreeSolver 1192 5 5 2 AFMU 1192 5 5 12 SPARQL-Algebra 1192 0 0 0 JSAC 1192 1192 1192 0
  • 26. 26  SQCFramework:  Based on real data, real log queries  Flexible  Customizable  Use-case specific  Similarity error is inversely (in general) proportional to benchmark size  Random selection in general generates benchmarks of smaller similarity errors  Diversity score is inversely (in general) proportional to benchmark size  FEASIBLE-Exemplars generates the more diverse benchmarks  JSAC correctly handled all cases in with reasonable QMpH  SQCFramework available from (https://github.com/dice-group/sqcframework)
  • 27. 27