SlideShare une entreprise Scribd logo
1  sur  21
Waqar Alamgir / 4850580
Prof. Dr. Wolf-Tilo Balke
Technische Universität Braunschweig
RDF Join Query Processing
with Dual Simulation
Pruning
1.Introduction &
MotivationRDF Join Query Processing with
Dual Simulation Pruning
1/20
Relational Databases
Graph Databases
Source: Google trends, query on 31.10.2018
3
2.Research
WorkRDF Join Query Processing with Dual
Simulation Pruning
4
2. Problem: RDBMS for
Knowledge Graphs● Using traditional database systems.
● SPARQL using database query processing.
● Use of 6-way indexes on the RDF data.
● Large intermediate join results.
Scalability of the query processor still remains a challenge.
5
2. Solution: In-Memory Graph
Databases● In-memory data structure.
● Data structures that are suitable for disk storage as well as
for main memory.
● Suitable for fast data retrievals.
● Small intermediate join results.
My research project is based on two projects:
1. BitMat
6
3.Core
ConceptsRDF Join Query Processing with Dual
Simulation Pruning
7
8
3. Knowledge Graphs
How do I represent the following fact:
“Pluto has been discovered in 1930”
in an intuitive way?
9
BitMat
● A compressed bit-matrix structure for
storing huge RDF graphs.
● A scalable lightweight join query
processor for RDF data.
● Employs a pruning technique to avoid
building intermediate join tables,
followed by a variable-binding
matching algorithm on in-memory
BitMats.
● Author: Medha Atre, Rensselaer
Polytechnic Institute Troy, USA.
SparqlSim
● A software prototype to compute graph
algorithms on RDF data.
● Reads SPARQL queries but computes
simulations as answers
● Was originally designed to compute pruning
for SPARQL query processing [ICDE’19]
based on graph pattern matching principles,
namely dual simulation.
● Author: Stephan Mennicke, Technische
Universität Braunschweig, Germany.
10
Actors
Keanu Reeves
Tom Cruise
Movies
The Matrix
John Wick
MI
Directors
Lana Wachowski
Brad Bird
Starring DirectedBy
11
SELECT ?actor ?movie ?directors WHERE
{
?actor <Starring> ?movie .
?movie <DirectedBy> ?director .
}
Y = [1, 0, 1, 0, 1, 0]
4.Implementation
RDF Join Query Processing with Dual
Simulation Pruning
12
Architecture
13
Rdf Bridge
● A high performance tool written in Python & Go to
generate BitMat structure based databases.
● Takes knowledge base as input.
● Generates BitMat database and BitMat configuration
file.
● Highly distributed and tested for SWAT Project - the
Lehigh University Benchmark (LUBM) with 1.3B triples.
Built using Python, Go, Redis & available at: 14
BitMat Interface
● A GUI tool that allows executing SPARQL queries
directly into the BitMat tool.
● Tool has internal parser that converts SPARQL into
BitMat query.
● Outputs query result with pretty print directly into the
interface, along with query statistics.
● Tested with SWAT Project - the Lehigh University
Benchmark (LUBM) with 1.3B triples.
15
Screening Graph in BitMat
● BitMat extension to support method of
inequalities over existing pruning approach.
● Implemented such that both original and method
of inequalities pruning can be evaluated on
same queries.
● Results into less intermediate join query results:
In some queries we were able to prune more
than 65M triples on Lubm dataset.
Written in C++ & available at:
16
5.Results &
Future FocusRDF Join Query Processing with
Dual Simulation Pruning
17
Evaluation
We tested both pruning approaches with following datasets
● Lubm 1 - 103K Triples - Sub: 17K Pre: 18 Obj: 13K
● Lubm Full - 1.3B Triples - Sub: 223M Pre: 18 Obj: 167M
● DBPedia Sample - 97.5MTriples - Sub: 25.4K Pre: 31.4K Obj: 25.3M
● WikiData Sample - 318.5M Triples - Sub: 62M Pre: 4.6K Obj: 68.9M
Evaluation was performed on:
18
Results Comparison
Dataset
Avg. Prune Time (secs)
BitMat
Avg. Prune Time (secs)
SparqlSim
Lubm 1
103K Triples with 23 Queries
0.000753 0.204330
Lubm Full
1.3B Triples with 23 Queries
6.601197 2.802187
DBPedia Sample
97.5M Triples with 28 Queries
0.002939 0.033740
WikiData Sample
318.5M Triples with 43 Queries
0.031471 0.020458
19
Future Focus
● SparqlSim: Pruning as a service.
By creating Pruning service, we would like to evaluate BitMat performance.
Currently under development with SparqlSim.
● Dynamic programing for Sparql queries.
Query classification can be formed to automatically decide which type of queries are
suitable for which tool. 20
Thanks!
Any questions?
Or else you can reach me or find more details at:
● https://github.com/waqar-alamgir/BitMat/ - Waqar Alamgir, TU Braunschweig,
Germany
● Fast Dual Simulation Processing of Graph Database Queries - Stephan Mennicke,
TU Braunschweig, Germany
● BitMat: A Main Memory Bit-matrix of RDFTriples - Medha Atre, Rensselaer
Polytechnic Institute Troy, USA
20/20

Contenu connexe

Tendances

Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets
robertlz
 
MPI Raspberry pi 3 cluster
MPI Raspberry pi 3 clusterMPI Raspberry pi 3 cluster
MPI Raspberry pi 3 cluster
Arafat Hussain
 
Dremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasetsDremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasets
Carl Lu
 
pMatlab on BlueGene
pMatlab on BlueGenepMatlab on BlueGene
pMatlab on BlueGene
vsachde
 

Tendances (20)

Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets
 
Eighth openCypher Implementers Group Meeting: Status Update
Eighth openCypher Implementers Group Meeting: Status UpdateEighth openCypher Implementers Group Meeting: Status Update
Eighth openCypher Implementers Group Meeting: Status Update
 
Spark Streaming Intro @KTech
Spark Streaming Intro @KTechSpark Streaming Intro @KTech
Spark Streaming Intro @KTech
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
MPI Raspberry pi 3 cluster
MPI Raspberry pi 3 clusterMPI Raspberry pi 3 cluster
MPI Raspberry pi 3 cluster
 
Data Visualization in Python
Data Visualization in PythonData Visualization in Python
Data Visualization in Python
 
Dremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasetsDremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasets
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
pMatlab on BlueGene
pMatlab on BlueGenepMatlab on BlueGene
pMatlab on BlueGene
 
Building k-nn Graphs From Large Text Data
Building k-nn Graphs From Large Text DataBuilding k-nn Graphs From Large Text Data
Building k-nn Graphs From Large Text Data
 
Versioned Triple Pattern Fragments
Versioned Triple Pattern FragmentsVersioned Triple Pattern Fragments
Versioned Triple Pattern Fragments
 
Lightweight Collection and Storage of Software Repository Data with DataRover
Lightweight Collection and Storage of  Software Repository Data with DataRoverLightweight Collection and Storage of  Software Repository Data with DataRover
Lightweight Collection and Storage of Software Repository Data with DataRover
 
An Empirical Evaluation of RDF Graph Partitioning Techniques
An Empirical Evaluation of RDF Graph Partitioning TechniquesAn Empirical Evaluation of RDF Graph Partitioning Techniques
An Empirical Evaluation of RDF Graph Partitioning Techniques
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
 
Data science in ruby, is it possible? is it fast? should we use it?
Data science in ruby, is it possible? is it fast? should we use it?Data science in ruby, is it possible? is it fast? should we use it?
Data science in ruby, is it possible? is it fast? should we use it?
 
Graph operations in Git version control system
Graph operations in Git version control systemGraph operations in Git version control system
Graph operations in Git version control system
 
Lisp Machine Prunciples
Lisp Machine PrunciplesLisp Machine Prunciples
Lisp Machine Prunciples
 
MediaEval 2017 - Interestingness Task: RUC at MediaEval 2017: Predicting Medi...
MediaEval 2017 - Interestingness Task: RUC at MediaEval 2017: Predicting Medi...MediaEval 2017 - Interestingness Task: RUC at MediaEval 2017: Predicting Medi...
MediaEval 2017 - Interestingness Task: RUC at MediaEval 2017: Predicting Medi...
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
Real time debugging: using non-intrusive tracepoints to debug live systems
Real time debugging: using non-intrusive tracepoints to debug live systemsReal time debugging: using non-intrusive tracepoints to debug live systems
Real time debugging: using non-intrusive tracepoints to debug live systems
 

Similaire à RDF Join Query Processing with Dual Simulation Pruning

Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic Optimizing
Databricks
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
inside-BigData.com
 
(ATS6-PLAT03) What's behind Discngine collections
(ATS6-PLAT03) What's behind Discngine collections(ATS6-PLAT03) What's behind Discngine collections
(ATS6-PLAT03) What's behind Discngine collections
BIOVIA
 
DA 592 - Term Project Report - Berker Kozan Can Koklu
DA 592 - Term Project Report - Berker Kozan Can KokluDA 592 - Term Project Report - Berker Kozan Can Koklu
DA 592 - Term Project Report - Berker Kozan Can Koklu
Can Köklü
 

Similaire à RDF Join Query Processing with Dual Simulation Pruning (20)

Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
 
Small, fast and useful – MMTF a new paradigm in macromolecular data transmiss...
Small, fast and useful – MMTF a new paradigm in macromolecular data transmiss...Small, fast and useful – MMTF a new paradigm in macromolecular data transmiss...
Small, fast and useful – MMTF a new paradigm in macromolecular data transmiss...
 
詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems
 
詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems
 
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
 
Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic Optimizing
 
ACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformatics
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
 
(ATS6-PLAT03) What's behind Discngine collections
(ATS6-PLAT03) What's behind Discngine collections(ATS6-PLAT03) What's behind Discngine collections
(ATS6-PLAT03) What's behind Discngine collections
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
 
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
 
DA 592 - Term Project Report - Berker Kozan Can Koklu
DA 592 - Term Project Report - Berker Kozan Can KokluDA 592 - Term Project Report - Berker Kozan Can Koklu
DA 592 - Term Project Report - Berker Kozan Can Koklu
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
 
Gobblin: Unifying Data Ingestion for Hadoop
Gobblin: Unifying Data Ingestion for HadoopGobblin: Unifying Data Ingestion for Hadoop
Gobblin: Unifying Data Ingestion for Hadoop
 
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfOSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
 

Plus de wajrcs

Plus de wajrcs (6)

Mastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example ProjectMastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example Project
 
A Fairness-aware Machine Learning Interface for End-to-end Discrimination Dis...
A Fairness-aware Machine Learning Interface for End-to-end Discrimination Dis...A Fairness-aware Machine Learning Interface for End-to-end Discrimination Dis...
A Fairness-aware Machine Learning Interface for End-to-end Discrimination Dis...
 
Continuous Delivery - Automate & Build Better Software with Travis CI
Continuous Delivery - Automate & Build Better Software with Travis CIContinuous Delivery - Automate & Build Better Software with Travis CI
Continuous Delivery - Automate & Build Better Software with Travis CI
 
Infrastructure Automation with Chef & Ansible
Infrastructure Automation with Chef & AnsibleInfrastructure Automation with Chef & Ansible
Infrastructure Automation with Chef & Ansible
 
Hacking hhvm
Hacking hhvmHacking hhvm
Hacking hhvm
 
Domain Driven Design using Laravel
Domain Driven Design using LaravelDomain Driven Design using Laravel
Domain Driven Design using Laravel
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

RDF Join Query Processing with Dual Simulation Pruning

  • 1. Waqar Alamgir / 4850580 Prof. Dr. Wolf-Tilo Balke Technische Universität Braunschweig RDF Join Query Processing with Dual Simulation Pruning
  • 2. 1.Introduction & MotivationRDF Join Query Processing with Dual Simulation Pruning 1/20
  • 3. Relational Databases Graph Databases Source: Google trends, query on 31.10.2018 3
  • 4. 2.Research WorkRDF Join Query Processing with Dual Simulation Pruning 4
  • 5. 2. Problem: RDBMS for Knowledge Graphs● Using traditional database systems. ● SPARQL using database query processing. ● Use of 6-way indexes on the RDF data. ● Large intermediate join results. Scalability of the query processor still remains a challenge. 5
  • 6. 2. Solution: In-Memory Graph Databases● In-memory data structure. ● Data structures that are suitable for disk storage as well as for main memory. ● Suitable for fast data retrievals. ● Small intermediate join results. My research project is based on two projects: 1. BitMat 6
  • 7. 3.Core ConceptsRDF Join Query Processing with Dual Simulation Pruning 7
  • 8. 8
  • 9. 3. Knowledge Graphs How do I represent the following fact: “Pluto has been discovered in 1930” in an intuitive way? 9
  • 10. BitMat ● A compressed bit-matrix structure for storing huge RDF graphs. ● A scalable lightweight join query processor for RDF data. ● Employs a pruning technique to avoid building intermediate join tables, followed by a variable-binding matching algorithm on in-memory BitMats. ● Author: Medha Atre, Rensselaer Polytechnic Institute Troy, USA. SparqlSim ● A software prototype to compute graph algorithms on RDF data. ● Reads SPARQL queries but computes simulations as answers ● Was originally designed to compute pruning for SPARQL query processing [ICDE’19] based on graph pattern matching principles, namely dual simulation. ● Author: Stephan Mennicke, Technische Universität Braunschweig, Germany. 10
  • 11. Actors Keanu Reeves Tom Cruise Movies The Matrix John Wick MI Directors Lana Wachowski Brad Bird Starring DirectedBy 11 SELECT ?actor ?movie ?directors WHERE { ?actor <Starring> ?movie . ?movie <DirectedBy> ?director . } Y = [1, 0, 1, 0, 1, 0]
  • 12. 4.Implementation RDF Join Query Processing with Dual Simulation Pruning 12
  • 14. Rdf Bridge ● A high performance tool written in Python & Go to generate BitMat structure based databases. ● Takes knowledge base as input. ● Generates BitMat database and BitMat configuration file. ● Highly distributed and tested for SWAT Project - the Lehigh University Benchmark (LUBM) with 1.3B triples. Built using Python, Go, Redis & available at: 14
  • 15. BitMat Interface ● A GUI tool that allows executing SPARQL queries directly into the BitMat tool. ● Tool has internal parser that converts SPARQL into BitMat query. ● Outputs query result with pretty print directly into the interface, along with query statistics. ● Tested with SWAT Project - the Lehigh University Benchmark (LUBM) with 1.3B triples. 15
  • 16. Screening Graph in BitMat ● BitMat extension to support method of inequalities over existing pruning approach. ● Implemented such that both original and method of inequalities pruning can be evaluated on same queries. ● Results into less intermediate join query results: In some queries we were able to prune more than 65M triples on Lubm dataset. Written in C++ & available at: 16
  • 17. 5.Results & Future FocusRDF Join Query Processing with Dual Simulation Pruning 17
  • 18. Evaluation We tested both pruning approaches with following datasets ● Lubm 1 - 103K Triples - Sub: 17K Pre: 18 Obj: 13K ● Lubm Full - 1.3B Triples - Sub: 223M Pre: 18 Obj: 167M ● DBPedia Sample - 97.5MTriples - Sub: 25.4K Pre: 31.4K Obj: 25.3M ● WikiData Sample - 318.5M Triples - Sub: 62M Pre: 4.6K Obj: 68.9M Evaluation was performed on: 18
  • 19. Results Comparison Dataset Avg. Prune Time (secs) BitMat Avg. Prune Time (secs) SparqlSim Lubm 1 103K Triples with 23 Queries 0.000753 0.204330 Lubm Full 1.3B Triples with 23 Queries 6.601197 2.802187 DBPedia Sample 97.5M Triples with 28 Queries 0.002939 0.033740 WikiData Sample 318.5M Triples with 43 Queries 0.031471 0.020458 19
  • 20. Future Focus ● SparqlSim: Pruning as a service. By creating Pruning service, we would like to evaluate BitMat performance. Currently under development with SparqlSim. ● Dynamic programing for Sparql queries. Query classification can be formed to automatically decide which type of queries are suitable for which tool. 20
  • 21. Thanks! Any questions? Or else you can reach me or find more details at: ● https://github.com/waqar-alamgir/BitMat/ - Waqar Alamgir, TU Braunschweig, Germany ● Fast Dual Simulation Processing of Graph Database Queries - Stephan Mennicke, TU Braunschweig, Germany ● BitMat: A Main Memory Bit-matrix of RDFTriples - Medha Atre, Rensselaer Polytechnic Institute Troy, USA 20/20