SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Benchmarking Commercial RDF Stores with Publications
Office Dataset
Ghislain Auguste Atemezing, Ph.D1
1Mondeca, 35 Boulevard de Strasbourg, 75010, Paris, France,
Twitter: @gatemezing
Web: http://www.mondeca.com
Benchmark material: https://github.com/gatemezing/posb
04th June, 2018
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 1 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Agenda
1 Mondeca in a nutshell
Who we are
Why do clients come to us
2 Benchmark Context
3 Publications Office of the EU Datasets
Data Workflow & Use cases
Ontology
Datasets
Requirements
4 Benchmark Configuration
Experimental set up
5 Query Analysis
Instantaneous Queries
Analytical Queries
Read/Write queries
6 Benchmark Results
7 Conclusion
8 Aknowledgements
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 2 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Who we are
Mondeca in a nutshell
Located in Paris, France
Leading French semantic technology solution provider since 1999
SMA : agile and flat structure
Our solution : Smart Content Factory combines data management + content
annotation + semantic search.
Major clients in publishing activities(e.g.,Turner, AP, NPR), Insurance domain,
goods industry, national government and international organizations
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 3 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Why do clients come to us
Mondeca in a nutshell
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 4 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Why Benchmarking PO datasets?
1 To match the current and planned use cases of the Publications Office of the
European Union (OP) w.r.t current state-of-art of RDF stores
2 To analyze deeply both functional requirements and documentation of 7
commercial RDF stores : Virtuoso, GraphDB, Neo4j, Stardog, Oracle,
Blazegraph and Marklogic.
3 To document and motivate the choice of a given RDF stores based on key
requirements defined internally after interviews.
The end goal of the study is to identify
the RDF Store(s) that will best match the
OP’s planned use cases and requirements
in terms of scalability, stability and
reliability.
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 5 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Bench Context - OP
Publications Office of the European Union
publishes the daily Official Journal of the European Union in 23 official EU
languages (24 when Irish is required).
produces and disseminates of legal and general publications in a variety of paper
and electronic formats
Online services
EUR-Lex 1 : provides free access to European Union law
EU Bookshop : the online library and bookshop of publications from the
institutions and other bodies of the EU.
EU Open Data Portal is the single point of access to data from the institutions and
other bodies of the European Union.
Eurovoc : is a multilingual, multidisciplinary thesaurus covering the activities of the
EU
Whoiswho is the official directory of the EU.
CORDIS : repository and portal for EU-funded research projects
1. http://eur-lex.europa.eu/
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 6 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Data Workflow & Use cases
CELLAR RDF is the semantic repository at
OP, with ODP store featuring Linked Data
applications.
Current RDF usage/ Wish-list
Volume : approximately 730 million
triples.
The size of the RDF store increase
500 million triples after 2 years
OP foresees a volume of 1,5 billion
triples in the next 2 years as a
minimum.
Wish : handle 10x today’s volume
(ca.7 billion triples.)
OP receives 100k to 200k SPARQL
queries / day with strong growth. The
target architecture must handle 2mio
queries/day minimum
Search via browse by subject tab :
http ://publications.europa.eu/en/browse-
by-subject
example : https ://goo.gl/Yci9Nz
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 7 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Ontology
OP Dataset - Ontology
Common Data Model (CDM)
CDM is the ontology used to generate
RDF dataset at OOPCE
CDM is based on FRBR model to
represent work, expression, and
manifestation
Instances in PROD dataset
Dataset with 187 instantiated classes
covering 61% of CDM
4,958,220 blank nodes
Top 3 classes : cdm:item (4.77%);
cdm:expression (4.52%) and
cdm:manifestation (2.30%)
CDM ontology statistics
Metric Number
Class 308
Object Property 803
Data Property 690
SubClassOf 615
SubObjectProp. 485
InverseObjectProp. 248
SubDataProperty 405
DL Expressivity ALHOIQ(D)
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 8 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Datasets
OP Dataset - Explicit knowledge
The values in the tables are explicit triples in the knowledge base.
Top five instances by class in PROD dataset
Class #Instance Percentage
cdm:item 34,747,955 4.77
cdm:expression 32,898,325 4.52
cdm:manifestation 16,768,690 2.30
cdm:work 7,771,103 1.06
cdm:resource_legal 7,674,632 1.05
Size of dump datasets
Dataset name Disk size #Files #Triples RDF
format
Normalized (.zip) 226 GB 2,195 727,442,978 NQUADS
Non normalized (.tgz) 12 GB 64 728,163,464 NQUADS
NAL dataset 282 MB 72 402,926 RDF/XML
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 9 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Requirements
RDF Stores Killer Requirements
Blazegraph v2.1.4 Open Edition
Poor results in the earlier stage of the bench : (i) too slow in loading data (90h
43min, almost 4 days!!)
Too many time out (15) in first test in queries from category 1
No support at all on repeated requests from them to improve results or validate
our configuration file.
Neo4J
All loading tests aborted after 40h 27min 2.
Need to port the code (ad-hoc importRDF) for each Neo4j upgrade : blueprints,
tinkerpop, gremlin
Too much maintenance on this stack.
2. A work in progress with Neo4J techs to improve our ad-hoc RDF import loader
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 10 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Experimental set up
Bench Configuration
Hardware Server
CPU : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz , 6C/12T
RAM : 128 GB; Disk capacity : 4 TO SATA.
Operating System : CentOS 7, 64 bits and Java 1.8.0 running.
Marklogic FO
CPU : Intel(R) Xeon(R) E3 1245 v5 4c/8T @ 3.5GHz
RAM : 64 GB; Disk storage : 3 x 500 Go SSD
Tools for benchmark
JENA qparse tool to validate all the queries
Open tool Sparql Query Benchmarker 3 used with 20 runs per categories to
warm up the server; 5 runs for current benchmark
3. https://github.com/rvesse/sparql-query-bm
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 11 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Experimental set up
Triple stores setup
Virtuoso
NumberOfBuffers = 5450000 and MaxDirtyBuffers = 4000000
Stardog
Set Java heap size = 16GB and MaxDirectMemorySize = 8GB.
Deactivation of the strict parsing option, SL option by default
GraphDB
Set entity index size to 500000000 with entity predicate list enabled,
Disabling the content index.
Oracle
pga_aggregate_limit = 64GB and pga_aggregate_target = 32G
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 12 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Instantaneous Queries
Query Analysis
FIGURE – Queries of Category 1
20 instantaneous queries
Query form #Total
SELECT 16
DESCRIBE 3
CONSTRUCT 1
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 13 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Analytical Queries
Query Analysis
FIGURE – Queries of Category 2
Analytical queries
24 queries
Query form : SELECT
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 14 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Read/Write queries
Query Analysis
All the queries were gathered and developed by OP’s metadata teams.
The queries were originally optimized for Virtuoso.
The results in the quantitative benchmark are probably biased in favor of the
current triple store.
To remove the bias, we asked to other vendors to provide us with optimized
queries for their engines.
We present the results of the quantitative
study, which is part of a more global study
containing 66 functional requirements .
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 15 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Bulk loading
PROD Dataset (727Mio) : ranking order -> Virtuoso (3.8h), Stardog (4.59),
Marklogic (5.83), Oracle (23.07) and GraphDB (35.64). Oracle optimized to 8h!!
2Bio 4 : ranking order -> Virtuoso (13.01h), Stardog (13.30), GraphDB (17.46),
Marklogic (27.96) and Oracle (43.7). Oracle optimized to 32h!!
5Bio : ranking order -> Virtuoso (36.10h), GraphDB (44.14), Marklogic (169.95),
Stardog (unsuccessful), Oracle (N/A)
4. Generated by postfixing resources of type publications.europa.eu/resource/cellar
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 16 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Results Category 1 - time out=60s
Virtuoso is faster than all the rest of
the triple stores.
No time out with Virtuoso. Marklogic (1
time out), Oracle (2 time out), Stardog
(2 time out), GraphDB (4 time out) and
Blazegraph ( 15 time out).
Stardog performs poorly compared to
GraphDB and Oracle.
Blazegraph was removed after this
test.
Marklogic is NOT constant in
multithreading.
Stardog performs poorly compared to
GraphDB and Oracle.
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 17 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Results Category 2 - time out=600s
Virtuoso is faster than all the rest of
the triple stores.
No time out with Virtuoso, GraphDB
and Marklogic
1 timed out query (Q10) with Oracle.
4 timed out (Q15, Q16, Q19, Q22)
with Stardog.
Bench analytic queries ranking
RDF
Stores
#Time
Out
Rank
Virtuoso 0 1
Stardog 4 6
GraphDB
EE
0 3
GraphDB
EE
RDFS+
0 2
Marklogic 0 5
Oracle
12c
1 4
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 18 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Results Category 3 - time out=10s
01 CONSTRUCT; 01 DELETE/INSERT and 03 INSERT IN query.
Virtuoso is faster, followed by Marklogic and GraphDB.
Oracle performs worse in monothread
Stardog and Oracle scores are significantly lower than Marklogic and GraphDB
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 19 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Results Category 3 - time out=10s
Oracle performs better in multithread
scenario. Why? -> Index/disk
calibration?!
Stardog is constant in magnitude of
QMpH.
GraphDB and Marklogic have
significant changes from 5 clients to
20 clients.
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 20 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Stability Test
Stress test on triple stores using instantaneous queries. (category 1)
The test starts by specifying the number of parallel clients = 128.
Each client completes the run of the mix queries in parallel.
The number of parallel clients is then multiplied by 2 and the process is repeated.
This repeats until either the maximum runtime (180min) or the maximum number
of threads are reached.
Result Stress Test
Stardog and Oracle finished with the limit of the parallel threads.
Virtuoso and GraphDB completed the test after 180 min, reaching 256 parallel
threads.
GraphDB shows fewer errors compared to Virtuoso.
GraphDB is likely to be more stable respectively in this order to Stardog, Oracle
and Virtuoso.
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 21 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
What We learned
Lessons learned
3 out of the 5 RDF stores come close to the key requirements (Robustness,
Scalability, Reliability and Stability)
None of the RDF stores perfectly matches OP’s business cases
When pushed to their limits, all of the RDF stores require extensive support from
the vendors (e.g., case of Oracle 12c and GraphDB)
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 22 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Conclusions
We have presented a quantitative comparison of 5 commercial RDF stores :
Virtuoso, GraphDB, Oracle, Stardog and Marklogic based on OP datasets and
requirements.
The results show that Virtuoso and Stardog are faster in bulk loading.
Virtuoso outperforms respectively to GraphDB, Stardog and Oracle in
query-based performance.
GraphDB shows to be the winner in the stability test performed in this benchmark.
This study gives an overview of the current state of RDF stores performance with
respect to PO’s dataset
This work can be partly used to assess enterprise RDF stores
We plan to get query rewrites for all the stores vendors and evaluate the results
We also plan to perform the same benchmark on AWS Neptune 5
We plan to better compare this work with state-of-the-art benchmarking, maybe
using IGUANA framework.
5. http://blog.mondeca.com/2018/02/09/requetes-sparql-avec-neptune/
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 23 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Acknowledgements
We would like to thank RDF teams at Onto-
text, Stardog Union, Oracle, Marklogic and
OpenLink.
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 24 / 24

Contenu connexe

Tendances

The use of R statistical package in controlled infrastructure. The case of Cl...
The use of R statistical package in controlled infrastructure. The case of Cl...The use of R statistical package in controlled infrastructure. The case of Cl...
The use of R statistical package in controlled infrastructure. The case of Cl...Adrian Olszewski
 
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...semanticsconference
 
ICIC 2013 Conference Proceedings Nicolas Lalyre Syngenta
ICIC 2013 Conference Proceedings Nicolas Lalyre SyngentaICIC 2013 Conference Proceedings Nicolas Lalyre Syngenta
ICIC 2013 Conference Proceedings Nicolas Lalyre SyngentaDr. Haxel Consult
 
20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horse20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horseChris Southan
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Futuredgarijo
 
Making the data available with AMBIT cheminformatics platform
Making the data available with AMBIT cheminformatics platformMaking the data available with AMBIT cheminformatics platform
Making the data available with AMBIT cheminformatics platformNina Jeliazkova
 
ICWE2017 BigDataEurope
ICWE2017 BigDataEuropeICWE2017 BigDataEurope
ICWE2017 BigDataEuropeBigData_Europe
 
SWIMing VoCamp 2016 - ifcOWL overview and current state
SWIMing VoCamp 2016 - ifcOWL overview and current stateSWIMing VoCamp 2016 - ifcOWL overview and current state
SWIMing VoCamp 2016 - ifcOWL overview and current statePieter Pauwels
 
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...Dr. Haxel Consult
 
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?Dr. Haxel Consult
 
ISO 15926 Reference Data Engineering Methodology
ISO 15926 Reference Data Engineering MethodologyISO 15926 Reference Data Engineering Methodology
ISO 15926 Reference Data Engineering MethodologyAnatoly Levenchuk
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationdgarijo
 
On chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsOn chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsNina Jeliazkova
 
Beginners .net api dev days2017
Beginners  .net api   dev days2017Beginners  .net api   dev days2017
Beginners .net api dev days2017DevDays
 
ICIC 2013 Conference Proceedings Jan Baur STN
ICIC 2013 Conference Proceedings Jan Baur STNICIC 2013 Conference Proceedings Jan Baur STN
ICIC 2013 Conference Proceedings Jan Baur STNDr. Haxel Consult
 
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...Pedro Príncipe
 
Lift your data_inspire2012
Lift your data_inspire2012Lift your data_inspire2012
Lift your data_inspire2012EURECOM
 

Tendances (20)

GraphDB
GraphDBGraphDB
GraphDB
 
The use of R statistical package in controlled infrastructure. The case of Cl...
The use of R statistical package in controlled infrastructure. The case of Cl...The use of R statistical package in controlled infrastructure. The case of Cl...
The use of R statistical package in controlled infrastructure. The case of Cl...
 
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
 
ICIC 2013 Conference Proceedings Nicolas Lalyre Syngenta
ICIC 2013 Conference Proceedings Nicolas Lalyre SyngentaICIC 2013 Conference Proceedings Nicolas Lalyre Syngenta
ICIC 2013 Conference Proceedings Nicolas Lalyre Syngenta
 
20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horse20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horse
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
 
Making the data available with AMBIT cheminformatics platform
Making the data available with AMBIT cheminformatics platformMaking the data available with AMBIT cheminformatics platform
Making the data available with AMBIT cheminformatics platform
 
ICWE2017 BigDataEurope
ICWE2017 BigDataEuropeICWE2017 BigDataEurope
ICWE2017 BigDataEurope
 
SWIMing VoCamp 2016 - ifcOWL overview and current state
SWIMing VoCamp 2016 - ifcOWL overview and current stateSWIMing VoCamp 2016 - ifcOWL overview and current state
SWIMing VoCamp 2016 - ifcOWL overview and current state
 
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
 
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
 
ISO 15926 Reference Data Engineering Methodology
ISO 15926 Reference Data Engineering MethodologyISO 15926 Reference Data Engineering Methodology
ISO 15926 Reference Data Engineering Methodology
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
 
On chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsOn chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurements
 
Beginners .net api dev days2017
Beginners  .net api   dev days2017Beginners  .net api   dev days2017
Beginners .net api dev days2017
 
ICIC 2013 Conference Proceedings Jan Baur STN
ICIC 2013 Conference Proceedings Jan Baur STNICIC 2013 Conference Proceedings Jan Baur STN
ICIC 2013 Conference Proceedings Jan Baur STN
 
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
 
Lift your data_inspire2012
Lift your data_inspire2012Lift your data_inspire2012
Lift your data_inspire2012
 
Euro lipids 2014_graz
Euro lipids 2014_grazEuro lipids 2014_graz
Euro lipids 2014_graz
 

Similaire à Benchmarking Commercial RDF Stores with Publications Office Dataset

Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...OSTHUS
 
The Case for Graphs in Supply Chains
The Case for Graphs in Supply ChainsThe Case for Graphs in Supply Chains
The Case for Graphs in Supply ChainsNeo4j
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
 
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)Dag Endresen
 
Primers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code ReviewPrimers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code ReviewDelft University of Technology
 
Orbital presentation pt1_200112_v1
Orbital presentation pt1_200112_v1Orbital presentation pt1_200112_v1
Orbital presentation pt1_200112_v1ensmjd
 
Metadata catalogues survey results, EOSCpilot H2020 EU project
Metadata catalogues survey results, EOSCpilot H2020 EU projectMetadata catalogues survey results, EOSCpilot H2020 EU project
Metadata catalogues survey results, EOSCpilot H2020 EU projectMassimiliano Assante
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsSrinath Perera
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Greg Landrum
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BigData_Europe
 
Fairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesPistoia Alliance
 
Frankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectFrankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectGoethe Univeristy
 
WSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product OverviewWSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product OverviewWSO2
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Sanjay Padhi, Ph.D
 
Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!Josef Hardi
 
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...Pedro Príncipe
 
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE
 
Querying a Complex Web-Based KB for Cultural Heritage Preservation
Querying a Complex Web-Based KB  for Cultural Heritage PreservationQuerying a Complex Web-Based KB  for Cultural Heritage Preservation
Querying a Complex Web-Based KB for Cultural Heritage PreservationEster Giallonardo
 

Similaire à Benchmarking Commercial RDF Stores with Publications Office Dataset (20)

Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
 
The Case for Graphs in Supply Chains
The Case for Graphs in Supply ChainsThe Case for Graphs in Supply Chains
The Case for Graphs in Supply Chains
 
MOCHA 2018 Challenge @ ESWC2018
MOCHA 2018 Challenge @ ESWC2018MOCHA 2018 Challenge @ ESWC2018
MOCHA 2018 Challenge @ ESWC2018
 
Presentation of agriopenlink @ EFITA (main program)
Presentation of agriopenlink @ EFITA (main program)Presentation of agriopenlink @ EFITA (main program)
Presentation of agriopenlink @ EFITA (main program)
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
 
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)
 
Primers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code ReviewPrimers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code Review
 
Orbital presentation pt1_200112_v1
Orbital presentation pt1_200112_v1Orbital presentation pt1_200112_v1
Orbital presentation pt1_200112_v1
 
Metadata catalogues survey results, EOSCpilot H2020 EU project
Metadata catalogues survey results, EOSCpilot H2020 EU projectMetadata catalogues survey results, EOSCpilot H2020 EU project
Metadata catalogues survey results, EOSCpilot H2020 EU project
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and Applications
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
 
Fairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matrices
 
Frankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectFrankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee Projeect
 
WSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product OverviewWSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product Overview
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!
 
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
 
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
 
Querying a Complex Web-Based KB for Cultural Heritage Preservation
Querying a Complex Web-Based KB  for Cultural Heritage PreservationQuerying a Complex Web-Based KB  for Cultural Heritage Preservation
Querying a Complex Web-Based KB for Cultural Heritage Preservation
 

Plus de Ghislain Atemezing

Trends on Data Graphs & Security for the Internet of Things
Trends on Data Graphs & Security for the Internet of ThingsTrends on Data Graphs & Security for the Internet of Things
Trends on Data Graphs & Security for the Internet of ThingsGhislain Atemezing
 
Big Data & Taxonomies for Actionable Intelligence
Big Data & Taxonomies for Actionable IntelligenceBig Data & Taxonomies for Actionable Intelligence
Big Data & Taxonomies for Actionable IntelligenceGhislain Atemezing
 
LIVE: a Tool for Checking Licenses Compatibility between Vocabularies and Data
LIVE: a Tool for Checking Licenses Compatibility between Vocabularies and DataLIVE: a Tool for Checking Licenses Compatibility between Vocabularies and Data
LIVE: a Tool for Checking Licenses Compatibility between Vocabularies and DataGhislain Atemezing
 
Information Content based Ranking Metric for Linked Open Vocabularies
Information Content based Ranking Metric for Linked Open VocabulariesInformation Content based Ranking Metric for Linked Open Vocabularies
Information Content based Ranking Metric for Linked Open VocabulariesGhislain Atemezing
 
Harmonizing services for LOD vocabularies: a case study
Harmonizing services for LOD vocabularies: a case studyHarmonizing services for LOD vocabularies: a case study
Harmonizing services for LOD vocabularies: a case studyGhislain Atemezing
 
Visualisation and linked data applications edf 2013
Visualisation and linked data applications edf 2013Visualisation and linked data applications edf 2013
Visualisation and linked data applications edf 2013Ghislain Atemezing
 
Comparing Vocabularies for Representing Geographical Features and Their Geometry
Comparing Vocabularies for Representing Geographical Features and Their GeometryComparing Vocabularies for Representing Geographical Features and Their Geometry
Comparing Vocabularies for Representing Geographical Features and Their GeometryGhislain Atemezing
 

Plus de Ghislain Atemezing (10)

Trends on Data Graphs & Security for the Internet of Things
Trends on Data Graphs & Security for the Internet of ThingsTrends on Data Graphs & Security for the Internet of Things
Trends on Data Graphs & Security for the Internet of Things
 
Big Data & Taxonomies for Actionable Intelligence
Big Data & Taxonomies for Actionable IntelligenceBig Data & Taxonomies for Actionable Intelligence
Big Data & Taxonomies for Actionable Intelligence
 
Phd defense slides
Phd defense slidesPhd defense slides
Phd defense slides
 
LIVE: a Tool for Checking Licenses Compatibility between Vocabularies and Data
LIVE: a Tool for Checking Licenses Compatibility between Vocabularies and DataLIVE: a Tool for Checking Licenses Compatibility between Vocabularies and Data
LIVE: a Tool for Checking Licenses Compatibility between Vocabularies and Data
 
publishing-ign-data
 publishing-ign-data publishing-ign-data
publishing-ign-data
 
cold2014-ldvizwiz
cold2014-ldvizwizcold2014-ldvizwiz
cold2014-ldvizwiz
 
Information Content based Ranking Metric for Linked Open Vocabularies
Information Content based Ranking Metric for Linked Open VocabulariesInformation Content based Ranking Metric for Linked Open Vocabularies
Information Content based Ranking Metric for Linked Open Vocabularies
 
Harmonizing services for LOD vocabularies: a case study
Harmonizing services for LOD vocabularies: a case studyHarmonizing services for LOD vocabularies: a case study
Harmonizing services for LOD vocabularies: a case study
 
Visualisation and linked data applications edf 2013
Visualisation and linked data applications edf 2013Visualisation and linked data applications edf 2013
Visualisation and linked data applications edf 2013
 
Comparing Vocabularies for Representing Geographical Features and Their Geometry
Comparing Vocabularies for Representing Geographical Features and Their GeometryComparing Vocabularies for Representing Geographical Features and Their Geometry
Comparing Vocabularies for Representing Geographical Features and Their Geometry
 

Dernier

Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
The SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsThe SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsDILIPKUMARMONDAL6
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxVelmuruganTECE
 

Dernier (20)

Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
The SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsThe SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teams
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptx
 

Benchmarking Commercial RDF Stores with Publications Office Dataset

  • 1. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Benchmarking Commercial RDF Stores with Publications Office Dataset Ghislain Auguste Atemezing, Ph.D1 1Mondeca, 35 Boulevard de Strasbourg, 75010, Paris, France, Twitter: @gatemezing Web: http://www.mondeca.com Benchmark material: https://github.com/gatemezing/posb 04th June, 2018 Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 1 / 24
  • 2. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Agenda 1 Mondeca in a nutshell Who we are Why do clients come to us 2 Benchmark Context 3 Publications Office of the EU Datasets Data Workflow & Use cases Ontology Datasets Requirements 4 Benchmark Configuration Experimental set up 5 Query Analysis Instantaneous Queries Analytical Queries Read/Write queries 6 Benchmark Results 7 Conclusion 8 Aknowledgements Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 2 / 24
  • 3. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Who we are Mondeca in a nutshell Located in Paris, France Leading French semantic technology solution provider since 1999 SMA : agile and flat structure Our solution : Smart Content Factory combines data management + content annotation + semantic search. Major clients in publishing activities(e.g.,Turner, AP, NPR), Insurance domain, goods industry, national government and international organizations Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 3 / 24
  • 4. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Why do clients come to us Mondeca in a nutshell Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 4 / 24
  • 5. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Why Benchmarking PO datasets? 1 To match the current and planned use cases of the Publications Office of the European Union (OP) w.r.t current state-of-art of RDF stores 2 To analyze deeply both functional requirements and documentation of 7 commercial RDF stores : Virtuoso, GraphDB, Neo4j, Stardog, Oracle, Blazegraph and Marklogic. 3 To document and motivate the choice of a given RDF stores based on key requirements defined internally after interviews. The end goal of the study is to identify the RDF Store(s) that will best match the OP’s planned use cases and requirements in terms of scalability, stability and reliability. Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 5 / 24
  • 6. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Bench Context - OP Publications Office of the European Union publishes the daily Official Journal of the European Union in 23 official EU languages (24 when Irish is required). produces and disseminates of legal and general publications in a variety of paper and electronic formats Online services EUR-Lex 1 : provides free access to European Union law EU Bookshop : the online library and bookshop of publications from the institutions and other bodies of the EU. EU Open Data Portal is the single point of access to data from the institutions and other bodies of the European Union. Eurovoc : is a multilingual, multidisciplinary thesaurus covering the activities of the EU Whoiswho is the official directory of the EU. CORDIS : repository and portal for EU-funded research projects 1. http://eur-lex.europa.eu/ Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 6 / 24
  • 7. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Data Workflow & Use cases CELLAR RDF is the semantic repository at OP, with ODP store featuring Linked Data applications. Current RDF usage/ Wish-list Volume : approximately 730 million triples. The size of the RDF store increase 500 million triples after 2 years OP foresees a volume of 1,5 billion triples in the next 2 years as a minimum. Wish : handle 10x today’s volume (ca.7 billion triples.) OP receives 100k to 200k SPARQL queries / day with strong growth. The target architecture must handle 2mio queries/day minimum Search via browse by subject tab : http ://publications.europa.eu/en/browse- by-subject example : https ://goo.gl/Yci9Nz Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 7 / 24
  • 8. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Ontology OP Dataset - Ontology Common Data Model (CDM) CDM is the ontology used to generate RDF dataset at OOPCE CDM is based on FRBR model to represent work, expression, and manifestation Instances in PROD dataset Dataset with 187 instantiated classes covering 61% of CDM 4,958,220 blank nodes Top 3 classes : cdm:item (4.77%); cdm:expression (4.52%) and cdm:manifestation (2.30%) CDM ontology statistics Metric Number Class 308 Object Property 803 Data Property 690 SubClassOf 615 SubObjectProp. 485 InverseObjectProp. 248 SubDataProperty 405 DL Expressivity ALHOIQ(D) Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 8 / 24
  • 9. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Datasets OP Dataset - Explicit knowledge The values in the tables are explicit triples in the knowledge base. Top five instances by class in PROD dataset Class #Instance Percentage cdm:item 34,747,955 4.77 cdm:expression 32,898,325 4.52 cdm:manifestation 16,768,690 2.30 cdm:work 7,771,103 1.06 cdm:resource_legal 7,674,632 1.05 Size of dump datasets Dataset name Disk size #Files #Triples RDF format Normalized (.zip) 226 GB 2,195 727,442,978 NQUADS Non normalized (.tgz) 12 GB 64 728,163,464 NQUADS NAL dataset 282 MB 72 402,926 RDF/XML Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 9 / 24
  • 10. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Requirements RDF Stores Killer Requirements Blazegraph v2.1.4 Open Edition Poor results in the earlier stage of the bench : (i) too slow in loading data (90h 43min, almost 4 days!!) Too many time out (15) in first test in queries from category 1 No support at all on repeated requests from them to improve results or validate our configuration file. Neo4J All loading tests aborted after 40h 27min 2. Need to port the code (ad-hoc importRDF) for each Neo4j upgrade : blueprints, tinkerpop, gremlin Too much maintenance on this stack. 2. A work in progress with Neo4J techs to improve our ad-hoc RDF import loader Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 10 / 24
  • 11. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Experimental set up Bench Configuration Hardware Server CPU : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz , 6C/12T RAM : 128 GB; Disk capacity : 4 TO SATA. Operating System : CentOS 7, 64 bits and Java 1.8.0 running. Marklogic FO CPU : Intel(R) Xeon(R) E3 1245 v5 4c/8T @ 3.5GHz RAM : 64 GB; Disk storage : 3 x 500 Go SSD Tools for benchmark JENA qparse tool to validate all the queries Open tool Sparql Query Benchmarker 3 used with 20 runs per categories to warm up the server; 5 runs for current benchmark 3. https://github.com/rvesse/sparql-query-bm Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 11 / 24
  • 12. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Experimental set up Triple stores setup Virtuoso NumberOfBuffers = 5450000 and MaxDirtyBuffers = 4000000 Stardog Set Java heap size = 16GB and MaxDirectMemorySize = 8GB. Deactivation of the strict parsing option, SL option by default GraphDB Set entity index size to 500000000 with entity predicate list enabled, Disabling the content index. Oracle pga_aggregate_limit = 64GB and pga_aggregate_target = 32G Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 12 / 24
  • 13. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Instantaneous Queries Query Analysis FIGURE – Queries of Category 1 20 instantaneous queries Query form #Total SELECT 16 DESCRIBE 3 CONSTRUCT 1 Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 13 / 24
  • 14. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Analytical Queries Query Analysis FIGURE – Queries of Category 2 Analytical queries 24 queries Query form : SELECT Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 14 / 24
  • 15. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Read/Write queries Query Analysis All the queries were gathered and developed by OP’s metadata teams. The queries were originally optimized for Virtuoso. The results in the quantitative benchmark are probably biased in favor of the current triple store. To remove the bias, we asked to other vendors to provide us with optimized queries for their engines. We present the results of the quantitative study, which is part of a more global study containing 66 functional requirements . Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 15 / 24
  • 16. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Bulk loading PROD Dataset (727Mio) : ranking order -> Virtuoso (3.8h), Stardog (4.59), Marklogic (5.83), Oracle (23.07) and GraphDB (35.64). Oracle optimized to 8h!! 2Bio 4 : ranking order -> Virtuoso (13.01h), Stardog (13.30), GraphDB (17.46), Marklogic (27.96) and Oracle (43.7). Oracle optimized to 32h!! 5Bio : ranking order -> Virtuoso (36.10h), GraphDB (44.14), Marklogic (169.95), Stardog (unsuccessful), Oracle (N/A) 4. Generated by postfixing resources of type publications.europa.eu/resource/cellar Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 16 / 24
  • 17. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Results Category 1 - time out=60s Virtuoso is faster than all the rest of the triple stores. No time out with Virtuoso. Marklogic (1 time out), Oracle (2 time out), Stardog (2 time out), GraphDB (4 time out) and Blazegraph ( 15 time out). Stardog performs poorly compared to GraphDB and Oracle. Blazegraph was removed after this test. Marklogic is NOT constant in multithreading. Stardog performs poorly compared to GraphDB and Oracle. Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 17 / 24
  • 18. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Results Category 2 - time out=600s Virtuoso is faster than all the rest of the triple stores. No time out with Virtuoso, GraphDB and Marklogic 1 timed out query (Q10) with Oracle. 4 timed out (Q15, Q16, Q19, Q22) with Stardog. Bench analytic queries ranking RDF Stores #Time Out Rank Virtuoso 0 1 Stardog 4 6 GraphDB EE 0 3 GraphDB EE RDFS+ 0 2 Marklogic 0 5 Oracle 12c 1 4 Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 18 / 24
  • 19. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Results Category 3 - time out=10s 01 CONSTRUCT; 01 DELETE/INSERT and 03 INSERT IN query. Virtuoso is faster, followed by Marklogic and GraphDB. Oracle performs worse in monothread Stardog and Oracle scores are significantly lower than Marklogic and GraphDB Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 19 / 24
  • 20. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Results Category 3 - time out=10s Oracle performs better in multithread scenario. Why? -> Index/disk calibration?! Stardog is constant in magnitude of QMpH. GraphDB and Marklogic have significant changes from 5 clients to 20 clients. Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 20 / 24
  • 21. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Stability Test Stress test on triple stores using instantaneous queries. (category 1) The test starts by specifying the number of parallel clients = 128. Each client completes the run of the mix queries in parallel. The number of parallel clients is then multiplied by 2 and the process is repeated. This repeats until either the maximum runtime (180min) or the maximum number of threads are reached. Result Stress Test Stardog and Oracle finished with the limit of the parallel threads. Virtuoso and GraphDB completed the test after 180 min, reaching 256 parallel threads. GraphDB shows fewer errors compared to Virtuoso. GraphDB is likely to be more stable respectively in this order to Stardog, Oracle and Virtuoso. Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 21 / 24
  • 22. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu What We learned Lessons learned 3 out of the 5 RDF stores come close to the key requirements (Robustness, Scalability, Reliability and Stability) None of the RDF stores perfectly matches OP’s business cases When pushed to their limits, all of the RDF stores require extensive support from the vendors (e.g., case of Oracle 12c and GraphDB) Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 22 / 24
  • 23. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Conclusions We have presented a quantitative comparison of 5 commercial RDF stores : Virtuoso, GraphDB, Oracle, Stardog and Marklogic based on OP datasets and requirements. The results show that Virtuoso and Stardog are faster in bulk loading. Virtuoso outperforms respectively to GraphDB, Stardog and Oracle in query-based performance. GraphDB shows to be the winner in the stability test performed in this benchmark. This study gives an overview of the current state of RDF stores performance with respect to PO’s dataset This work can be partly used to assess enterprise RDF stores We plan to get query rewrites for all the stores vendors and evaluate the results We also plan to perform the same benchmark on AWS Neptune 5 We plan to better compare this work with state-of-the-art benchmarking, maybe using IGUANA framework. 5. http://blog.mondeca.com/2018/02/09/requetes-sparql-avec-neptune/ Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 23 / 24
  • 24. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Acknowledgements We would like to thank RDF teams at Onto- text, Stardog Union, Oracle, Marklogic and OpenLink. Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 24 / 24