SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Querying and reasoning over
large scale building datasets: an outline of
a performance benchmark
Pieter Pauwels, Tarcisio Mendes de Farias, Chi Zhang,
Ana Roxin, Jakob Beetz, Jos De Roo, Christophe Nicolle
International Workshop on Semantic Big Data (SBD 2016)
in conjunction with the 2016 ACM SIGMOD Conference in San Francisco, USA
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Agenda
Introduction
• Context description
• Problem identified
Testing
environment
• ifcOWL and building models
• Rules and queries
• Triple stores
Results
• Query performance
• Additional findings
Querying and reasoning over large scale building datasets: an outline of a performance benchmark
2
July 1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Context description
◼ The architectural design and construction domains work on a daily
basis with massive amounts of data.
◼ In the context of BIM, a neutral, interoperable representation of
information consists in the Industry Foundation Classes (IFC)
standard
Difficult to handle the EXPRESS format
◼ Semantic Web technologies have been identified as a possible
solution
Semantic data enrichment
Schema and data transformations
◼ A semantic approach involves 3 main components:
Schema (Tbox)
• OWL ontology
• Information structure
Instances (ABox)
• Assertions
• Respects schema
definition
Rules (RBox)
• If-Then statements
• Involving elements
from the ABox and
theTBox
Querying and reasoning over large scale building datasets: an outline of a performance benchmark
3
July 1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Problem identified
◼ Different implementations exist for the components (TBox, ABox,
RBox) of such Semantic approach
Diverse reasoning engines
Diverse query processing techniques
Diverse query handling
Diverse dataset size
Diverse dataset complexity
◼ Missing an appropriate rule and query execution performance
benchmark
Expressiveness
vs.
performance
Querying and reasoning over large scale building datasets: an outline of a performance benchmark
4
July 1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Performance benchmark variables
◼ Main components
◼ These elements are implemented into 3 different systems
SPIN (SPARQL Inference Notation) and Jena
EYE
Stardog
◼ An ensemble of queries is addressed to the so-created systems
Schema
(TBox)
• ifcOWL
Instances
(ABox)
• 369 ifcOWL-
compliant
building
models
Rules
(RBox)
• 68 data
transformation
rules
Querying and reasoning over large scale building datasets: an outline of a performance benchmark
5
July 1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
TBox - the ifcOWL ontology
◼ All building models are encoded using the ifcOWL ontology
Built up under the impulse of numerous initiatives during the last 10
years
◼ The ontology used is the one that is made publicly available by the
buildingSMART Linked Data Working Group (LDWG)
http://ifcowl.openbimstandards.org/IFC4#
http://ifcowl.openbimstandards.org/IFC4_ADD1#
http://ifcowl.openbimstandards.org/IFC2X3_TC1#
http://ifcowl.openbimstandards.org/IFC2X3_Final#
Querying and reasoning over large scale building datasets: an outline of a performance benchmark
6
July 1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Call for papers – special issue in SWJ
◼ Semantic Web Journal – Interoperability, Usability, Applicability
http://www.semantic-web-journal.net
◼ Special issue on "Semantic Technologies and Interoperability in the
Built Environment"
◼ Important dates
March, 1st 2017 – paper submission deadline
May 1st 2017 – notification of acceptance
Ontologies for
AEC/FM
Linking BIM
models to
external data
sources
Multiple scale
integration
through semanitc
interoperability
Multilingual data
access and
annotation
Query
processing, query
performance
Semantic-based
building
monitoring
systems
Reasoning with
building data
Building data
publication
strategies
Big Linked Data
for building
information
Querying and reasoning over large scale building datasets: an outline of a performance benchmark
7
July 1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
ifcOWL Stats
July 1st, 2016 Querying and reasoning over large scale building datasets: an outline of a performance benchmark
8
Axioms 21306
Logical Axioms 13649
Classes 1230
Object properties 1578
Data properties 5
Individuals 1627
DL expressivity SROIQ(D)
SubClassOf axioms 4622
EquivalentClasses axioms 266
DisjointClasses axioms 2429
SubObjectPropertyOf axioms 1
InverseObjectProperties axioms 94
FunctionalObjectProperty axioms 1441
TransitiveObjectProperty axioms 1
ObjectPropertyDomain axioms 1577
ObjectPropertyRange axioms 1576
FunctionalDataProperty axioms 5
DataPropertyDomain axioms 5
DataPropertyRange axioms 5
Pieter Pauwels and Walter Terkaj, EXPRESS to OWL
for construction industry: towards a recommendable
and usable ifcOWL ontology. Automation in
Construction 63: 100-133 (2016).
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
ABox – Building sets
◼ Some BIM models are publicly available (364), whereas other are
undisclosed (5)
Building information models
created with different BIM
modelling environments
Exported to IFC2x3
Transformed into ifcOWL-
compliant RDF graphs using
a publicly available converter
BIM environment Number of files
Tekla Structures 227 (61,5%)
unknown or manual 38 (10,3%)
Autodesk Revit 27 (7,3%)
Xella BIM 15
Autodesk AutoCAD 12
iTConcrete 9
SDS 8
Nemetschek AllPlan 7
GraphiSoft ArchiCAD 5
Various others 21
IFC instances Average file size Number of files
0 – 500,000 0 – 30 MB 321
500,000 –
2,000,000
30 – 100 MB 37
> 2,000,000 > 100 MB 11
Querying and reasoning over large scale building datasets: an outline of a performance benchmark
9
July 1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
RBox – Data transformation rules
◼ Need for a representative set of rewrite rules
◼ 68 manually built rules
◼ Classified in several rule sets according to their content
Rule Set
(RS)
Description
RS1
Contains 2 rules for rewriting property set references into additional property statements
sbd:hasPropertySet and sbd:hasProperty. This is a small, yet often used rule set that can be used in
many contexts to simplify querying and data publication of common simple properties attached to IFC entity
instances.
RS2
Includes 31 rules, all involving subtypes of the IfcRelationship class (e.g. ifcowl:IfcRelAssigns,
ifcowl:IfcRelDecomposes, ifcowl:IfcRelAssociates, ifcowl:IfcRelDefines,
ifcowl:IfcRelConnects)
RS3 Contains 3 rules related to handling lists in IFC.
RS4 Contains one rule that allows wrapping simple data types.
RS5
Consists of 20 rules for inferring single property statements sbd:hasPropertySet and
sbd:hasProperty.
RS6
Extends RS5 and RS1 with 6 additional rules for inferring whether an objet is internal or external to a
building.
RS7 Contains 7 rules dealing with the (de)composition of building spaces and spatial elements.
Querying and reasoning over large scale building datasets: an outline of a performance benchmark
10
July 1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
ifcOWL Example Transformation
July 1st, 2016 Querying and reasoning over large scale building datasets: an outline of a performance benchmark
11
inst:IfcWindow_1893 inst:IfcWindow_1842
inst:IfcWallStandardCase_696
sbim:hasWindowsbim:hasWindow
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Implementation
• Implemented based on the
open source APIs of Topbraid
SPIN (SPIN API 1.4.0) and
Apache Jena (Jena Core 2.11.0,
Jena ARQ 2.11.0, Jena TDB
1.0.0)
• Rules are written with Topbraid
Composer Free version, and
they are exported as RDF Turtle
files.
• A small Java program is
implemented to read RDF
models, schema, rules from the
TDB store and query data.
• All the SPARQL queries are
configured using the Jena
org.apache.jena.sparql.algebra
package
• To avoid unnecessary
reasoning processes, in this
test environment only the RDFS
vocabulary is supported.
SPIN + Jena TDB
• Version ‘EYE-
Winter16.0302.1557’ (‘SWI-
Prolog 7.2.3 (amd64): Aug 25
2015, 12:24:59’).
• EYE is a semi-backward
reasoner enhanced with Euler
path detection.
• As our rule set currently
contains only rules using =>,
forward reasoning will take
place.
• Each command is executed 5
times
• Each command includes the
full ontology, the full set of rules
and the RDFS vocabulary, as
well as one of the 369 building
model files and one of the 3
query files.
• No triple store is used: triples
are processed directly from the
considered files.
EYE
• 4.0.2 Stardog semantic graph
database (Java 8, RDF 1.1
graph data model, OWL2
profiles, SPARQL 1.1)
• OWL reasoner + rule engine.
• Support of SWRL rules,
backward-chaining reasoning
• Reasoning is performed by
applying a query rewriting
approach (SWRL rules are
taken into account during the
query rewriting process).
• Stardog allows attaining a DL-
expressivity level of SROIQ(D).
• In this approach, SWRL rules
are taken into account during
the query rewriting process.
Stardog
Querying and reasoning over large scale building datasets: an outline of a performance benchmark
12
July 1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Queries
◼ We have built a limited list of 60 queries, each of which triggers at
least one of the available rules.
◼ As we focus here on query execution performance, the considered
queries are entirely based on the right-hand sides of the considered
rules.
◼ 3 queries:
Q1 a simple query with little results,
Q2 a simple query with many results,
and Q3 a complex query that triggers a considerable number of rules
Query Query Contents
Q1 ?obj sbd:hasProperty ?p
Q2
?point sbd:hasCoordinateX ?x .
?point sbd:hasCoordinateY ?y .
?point sbd:hasCoordinateZ ?z
Q3 ?d rdf:type sbd:ExternalWall
Querying and reasoning over large scale building datasets: an outline of a performance benchmark
13
July 1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Test environment
◼ In one central server
Supplied by the University of Burgundy, research group CheckSem,
Following specifications: Ubuntu OS, Intel Xeon CPU E5-2430 at 2.2GHz,
6 cores and 16GB of DDR3 RAM memory
◼ 3 Virtual Machines (VMs) were set up in this central server
SPIN VM (Jena TDB), EYE VM (EYE inference engine), Stardog VM
(Stardog triplestore)
◼ The VMs were managed as separate test environments and
Each of these VMs had 2 cores out of 6 allocated
Each contained the above resources (ontologies, data, rules, queries).
Querying and reasoning over large scale building datasets: an outline of a performance benchmark
14
July 1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Results
◼ Queries applied on 6 hand-
picked building models of
varying size
◼ In the SPIN approach
For Q1 and Q2, the
execution time = backward-
chaining inference process
+ actual query execution
time
For Q3, execution time =
query execution time itself
◼ In the EYE approach
Networking time is ignored
◼ In the Stardog approach
Execution time = backward-
chaining inference + actual
query execution time
Query
Building
Model
SPIN
(s)
EYE
(s)
Stardog
(s)
Q1
(simple,
little
results)
BM1 135,36 37,11 13,44
BM2 1,47 0,29 0,17
BM3 24,01 4,87 1,4
BM4 41,28 12,95 3,55
BM5 4,99 1,05 0,33
BM6 0,55 0,16 0,08
Q2
(simple,
many
results)
BM1 46,17 2,10 6,82
BM2 92,03 4,20 15,83
BM3 82,68 4,12 15,28
BM4 19,93 1,04 2,81
BM5 3,69 0,21 1,36
BM6 0,74 0,045 1,00
Q3
(complex)
BM1 0,001 0,001 0,07
BM2 0,006 0,003 0,12
BM3 0,002 0,003 0,31
BM4 0,005 0,001 0,20
BM5 0,006 0,013 0,20
BM6 0,001 0,001 0,13
Querying and reasoning over large scale building datasets: an outline of a performance benchmark
15
July 1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Query time related to result count
For Q1 for each of the considered
approaches
(green = SPIN; blue = EYE; black = Stardog)
For Q2 for each of the considered
approaches
(green = SPIN; blue = EYE; black = Stardog)
Querying and reasoning over large scale building datasets: an outline of a performance benchmark
16
July 1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Additional findings
• The three considered procedures are quite far apart from each other, explaining the considerable performance
differences, not only between the procedures, but also between diverse usages within one and the same
system.
• Algorithms and optimization techniques used for each approach aren't entirely used: differences in indexation
algorithms, query rewriting techniques and rule handling strategies used.
Indexing algorithms, query rewriting techniques, and rule handling strategies
• The disadvantage of forward-chaining reasoning process is that millions of triples can be materialized (EYE,
SPIN for Q1 and Q2)
• Using backward-chaining reasoning allows avoiding triple materialization, thus saving query execution time
(Stardog, SPIN for Q3).
Forward- versus backward-chaining
• Query Q3 triggers a rule that in turn triggers several other rules in the rule set. If the first rule does not fire,
however, the process stops early.
• Query Q2, however, fires relatively long rules. It takes more time to make these matches in all three approaches.
Type of data in the building model
• Loading files in memory at query execution time leads to considerable delays.
Impact of the triple store
• Linear relation: the more results are available, the more triples need to be matched, leading to more assertions.
Impact of the number of output results
Querying and reasoning over large scale building datasets: an outline of a performance benchmark
17
July 1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Conclusion and future work
◼ Comparison of 3 different approaches
SPIN, EYE and Stardog
◼ 3 queries applied over 6 different building models
◼ Future work consists in
Specifying more this initial performance benchmark with additional
data and rules
Executing additional queries on the rest of the set of building models
Comparing results on a wider scale:
― for the individual approaches separately,
― as well as with other approaches not considered here.
Querying and reasoning over large scale building datasets: an outline of a performance benchmark
18
July 1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Thank you for your attention.
Pieter Pauwels, Tarcisio Mendes de Farias, Chi Zhang,
Ana Roxin, Jakob Beetz, Jos De Roo, Christophe Nicolle
International Workshop on Semantic Big Data (SBD 2016)
in conjunction with the 2016 ACM SIGMOD Conference in San Francisco, USA
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Query Building Model SPIN EYE Stardog
Q1
BM1 135,36 37,11 13,44
BM2 1,47 0,29 0,17
BM3 24,01 4,87 1,4
BM4 41,28 12,95 3,55
BM5 4,99 1,05 0,33
BM6 0,55 0,16 0,08
Q2
BM1 46,17 2,10 6,82
BM2 92,03 4,20 15,83
BM3 82,68 4,12 15,28
BM4 19,93 1,04 2,81
BM5 3,69 0,21 1,36
BM6 0,74 0,045 1,00
Q3
BM1 0,001 0,001 0,07
BM2 0,006 0,003 0,12
BM3 0,002 0,003 0,31
BM4 0,005 0,001 0,20
BM5 0,006 0,013 0,20
BM6 0,001 0,001 0,13
Querying and reasoning over large scale building datasets: an outline of a performance benchmark
20
July 1st, 2016

Contenu connexe

En vedette

Wind_Energy_Law_2014_Amanda James _Avoiding Regulatory Missteps for Developer...
Wind_Energy_Law_2014_Amanda James _Avoiding Regulatory Missteps for Developer...Wind_Energy_Law_2014_Amanda James _Avoiding Regulatory Missteps for Developer...
Wind_Energy_Law_2014_Amanda James _Avoiding Regulatory Missteps for Developer...
Amanda James
 
Day 3 wars and revolts
Day 3 wars and revoltsDay 3 wars and revolts
Day 3 wars and revolts
Travis Klein
 
Spending multipliers
Spending multipliersSpending multipliers
Spending multipliers
Travis Klein
 
La meva primera presentació
La meva primera presentacióLa meva primera presentació
La meva primera presentació
helenbou4352
 
Formulario agenda telefonica
Formulario agenda telefonicaFormulario agenda telefonica
Formulario agenda telefonica
Nathalia Sanchez
 
Gdp and economic indicators
Gdp and economic indicatorsGdp and economic indicators
Gdp and economic indicators
Travis Klein
 
Day 5 race & slavery
Day 5 race & slaveryDay 5 race & slavery
Day 5 race & slavery
Travis Klein
 

En vedette (20)

Wind_Energy_Law_2014_Amanda James _Avoiding Regulatory Missteps for Developer...
Wind_Energy_Law_2014_Amanda James _Avoiding Regulatory Missteps for Developer...Wind_Energy_Law_2014_Amanda James _Avoiding Regulatory Missteps for Developer...
Wind_Energy_Law_2014_Amanda James _Avoiding Regulatory Missteps for Developer...
 
Day 3 wars and revolts
Day 3 wars and revoltsDay 3 wars and revolts
Day 3 wars and revolts
 
SBIC Enterprise Information Security Strategic Technologies
SBIC Enterprise Information Security Strategic TechnologiesSBIC Enterprise Information Security Strategic Technologies
SBIC Enterprise Information Security Strategic Technologies
 
Spending multipliers
Spending multipliersSpending multipliers
Spending multipliers
 
Map of empathy
Map of empathyMap of empathy
Map of empathy
 
White Paper: The Essential Guide to Exchange and the Private Cloud
White Paper: The Essential Guide to Exchange and the Private CloudWhite Paper: The Essential Guide to Exchange and the Private Cloud
White Paper: The Essential Guide to Exchange and the Private Cloud
 
La meva primera presentació
La meva primera presentacióLa meva primera presentació
La meva primera presentació
 
Formulario agenda telefonica
Formulario agenda telefonicaFormulario agenda telefonica
Formulario agenda telefonica
 
Gdp and economic indicators
Gdp and economic indicatorsGdp and economic indicators
Gdp and economic indicators
 
The Global IT Trust Curve survey - Comprehensive Results Presentation
The Global IT Trust Curve survey - Comprehensive Results PresentationThe Global IT Trust Curve survey - Comprehensive Results Presentation
The Global IT Trust Curve survey - Comprehensive Results Presentation
 
Reasoning with rules - Application to N3/EYE and Stardog
Reasoning with rules - Application to N3/EYE and StardogReasoning with rules - Application to N3/EYE and Stardog
Reasoning with rules - Application to N3/EYE and Stardog
 
Contracts
ContractsContracts
Contracts
 
2015 day 1
2015 day 12015 day 1
2015 day 1
 
Portfolio
PortfolioPortfolio
Portfolio
 
Managing Cyber Risk: Are Companies Safeguarding Their Assets?
Managing Cyber Risk: Are Companies Safeguarding Their Assets?Managing Cyber Risk: Are Companies Safeguarding Their Assets?
Managing Cyber Risk: Are Companies Safeguarding Their Assets?
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
 
Abortion
AbortionAbortion
Abortion
 
Day 5 race & slavery
Day 5 race & slaveryDay 5 race & slavery
Day 5 race & slavery
 
Social Media
Social MediaSocial Media
Social Media
 
Provisioning 2.0: The Future of Provisioning
Provisioning 2.0: The Future of ProvisioningProvisioning 2.0: The Future of Provisioning
Provisioning 2.0: The Future of Provisioning
 

Similaire à Querying and reasoning over large scale building datasets: an outline of a performance benchmark

An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
Shiyong Lu
 

Similaire à Querying and reasoning over large scale building datasets: an outline of a performance benchmark (20)

ACM SIGMOD SBD2016 - Querying and reasoning over large scale building dataset...
ACM SIGMOD SBD2016 - Querying and reasoning over large scale building dataset...ACM SIGMOD SBD2016 - Querying and reasoning over large scale building dataset...
ACM SIGMOD SBD2016 - Querying and reasoning over large scale building dataset...
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!
 
Gluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with HadoopGluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with Hadoop
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other things
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
 
WSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product OverviewWSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product Overview
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
 
BigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal PilotsBigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal Pilots
 
Exascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing WorldExascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing World
 
Benefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a ServiceBenefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a Service
 
Diane Van Etten_resume
Diane Van Etten_resumeDiane Van Etten_resume
Diane Van Etten_resume
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
0. ocp cfops flyby
0. ocp cfops flyby0. ocp cfops flyby
0. ocp cfops flyby
 
GraphTour 2020 - Neo4j: What's New?
GraphTour 2020 - Neo4j: What's New?GraphTour 2020 - Neo4j: What's New?
GraphTour 2020 - Neo4j: What's New?
 
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time ActionApache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
 

Plus de Ana Roxin

Plus de Ana Roxin (19)

Apporter du sens aux données BIM
Apporter du sens aux données BIMApporter du sens aux données BIM
Apporter du sens aux données BIM
 
Bringing Meaning to BIM Data
Bringing Meaning to BIM DataBringing Meaning to BIM Data
Bringing Meaning to BIM Data
 
Linked Data Vocabularies for BIM
Linked Data Vocabularies for BIMLinked Data Vocabularies for BIM
Linked Data Vocabularies for BIM
 
[Cib]achieving interoperability between bim and gis final
[Cib]achieving interoperability between bim and gis final[Cib]achieving interoperability between bim and gis final
[Cib]achieving interoperability between bim and gis final
 
Habilitation to conduct research (Habilitation à diriger des recherches)
Habilitation to conduct research (Habilitation à diriger des recherches)Habilitation to conduct research (Habilitation à diriger des recherches)
Habilitation to conduct research (Habilitation à diriger des recherches)
 
Les données liées pour le BIM
Les données liées pour le BIMLes données liées pour le BIM
Les données liées pour le BIM
 
Linked Data applications for BIM
Linked Data applications for BIMLinked Data applications for BIM
Linked Data applications for BIM
 
On the relation between Model View Definitions (MVDs) and Linked Data technol...
On the relation between Model View Definitions (MVDs) and Linked Data technol...On the relation between Model View Definitions (MVDs) and Linked Data technol...
On the relation between Model View Definitions (MVDs) and Linked Data technol...
 
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
 
Geographic information - standards available for describing geographical data
Geographic information - standards available for describing geographical dataGeographic information - standards available for describing geographical data
Geographic information - standards available for describing geographical data
 
Semantic Web applications for mobility and social interaction
Semantic Web applications for mobility and social interactionSemantic Web applications for mobility and social interaction
Semantic Web applications for mobility and social interaction
 
Customizing Semantic Profiling for Digital Advertising
Customizing Semantic Profiling for Digital AdvertisingCustomizing Semantic Profiling for Digital Advertising
Customizing Semantic Profiling for Digital Advertising
 
An Agile Process Modelling Approach for BIM Projects
An Agile Process Modelling Approach for BIM ProjectsAn Agile Process Modelling Approach for BIM Projects
An Agile Process Modelling Approach for BIM Projects
 
A Linked Data Perspective for BIM
A Linked Data Perspective for BIMA Linked Data Perspective for BIM
A Linked Data Perspective for BIM
 
A Semantic Web Approach for defining Building Views
A Semantic Web Approach for defining Building ViewsA Semantic Web Approach for defining Building Views
A Semantic Web Approach for defining Building Views
 
Federated Approach for Interoperating AEC/FM Ontologies
Federated Approach for Interoperating AEC/FM OntologiesFederated Approach for Interoperating AEC/FM Ontologies
Federated Approach for Interoperating AEC/FM Ontologies
 
ifcWOD (Web Of Data) - Semantically Adapting IFC Model Relations into OWL Pro...
ifcWOD (Web Of Data) - Semantically Adapting IFC Model Relations into OWL Pro...ifcWOD (Web Of Data) - Semantically Adapting IFC Model Relations into OWL Pro...
ifcWOD (Web Of Data) - Semantically Adapting IFC Model Relations into OWL Pro...
 
A Semantic Web Approach for defining Building Views
A Semantic Web Approach for defining Building ViewsA Semantic Web Approach for defining Building Views
A Semantic Web Approach for defining Building Views
 
COBieOWL An OWL ontology based on COBie standard
COBieOWL An OWL ontology based on COBie standardCOBieOWL An OWL ontology based on COBie standard
COBieOWL An OWL ontology based on COBie standard
 

Dernier

Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Hung Le
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
Kayode Fayemi
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
ZurliaSoop
 

Dernier (20)

Call Girls Near The Byke Suraj Plaza Mumbai »¡¡ 07506202331¡¡« R.K. Mumbai
Call Girls Near The Byke Suraj Plaza Mumbai »¡¡ 07506202331¡¡« R.K. MumbaiCall Girls Near The Byke Suraj Plaza Mumbai »¡¡ 07506202331¡¡« R.K. Mumbai
Call Girls Near The Byke Suraj Plaza Mumbai »¡¡ 07506202331¡¡« R.K. Mumbai
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
Lions New Portal from Narsimha Raju Dichpally 320D.pptx
Lions New Portal from Narsimha Raju Dichpally 320D.pptxLions New Portal from Narsimha Raju Dichpally 320D.pptx
Lions New Portal from Narsimha Raju Dichpally 320D.pptx
 
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait Cityin kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
 
Zone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxZone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptx
 
LITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORN
LITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORNLITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORN
LITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORN
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.
 
BEAUTIFUL PLACES TO VISIT IN LESOTHO.pptx
BEAUTIFUL PLACES TO VISIT IN LESOTHO.pptxBEAUTIFUL PLACES TO VISIT IN LESOTHO.pptx
BEAUTIFUL PLACES TO VISIT IN LESOTHO.pptx
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
History of Morena Moshoeshoe birth death
History of Morena Moshoeshoe birth deathHistory of Morena Moshoeshoe birth death
History of Morena Moshoeshoe birth death
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Ready Set Go Children Sermon about Mark 16:15-20
Ready Set Go Children Sermon about Mark 16:15-20Ready Set Go Children Sermon about Mark 16:15-20
Ready Set Go Children Sermon about Mark 16:15-20
 
BIG DEVELOPMENTS IN LESOTHO(DAMS & MINES
BIG DEVELOPMENTS IN LESOTHO(DAMS & MINESBIG DEVELOPMENTS IN LESOTHO(DAMS & MINES
BIG DEVELOPMENTS IN LESOTHO(DAMS & MINES
 

Querying and reasoning over large scale building datasets: an outline of a performance benchmark

  • 1. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Querying and reasoning over large scale building datasets: an outline of a performance benchmark Pieter Pauwels, Tarcisio Mendes de Farias, Chi Zhang, Ana Roxin, Jakob Beetz, Jos De Roo, Christophe Nicolle International Workshop on Semantic Big Data (SBD 2016) in conjunction with the 2016 ACM SIGMOD Conference in San Francisco, USA
  • 2. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Agenda Introduction • Context description • Problem identified Testing environment • ifcOWL and building models • Rules and queries • Triple stores Results • Query performance • Additional findings Querying and reasoning over large scale building datasets: an outline of a performance benchmark 2 July 1st, 2016
  • 3. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Context description ◼ The architectural design and construction domains work on a daily basis with massive amounts of data. ◼ In the context of BIM, a neutral, interoperable representation of information consists in the Industry Foundation Classes (IFC) standard Difficult to handle the EXPRESS format ◼ Semantic Web technologies have been identified as a possible solution Semantic data enrichment Schema and data transformations ◼ A semantic approach involves 3 main components: Schema (Tbox) • OWL ontology • Information structure Instances (ABox) • Assertions • Respects schema definition Rules (RBox) • If-Then statements • Involving elements from the ABox and theTBox Querying and reasoning over large scale building datasets: an outline of a performance benchmark 3 July 1st, 2016
  • 4. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Problem identified ◼ Different implementations exist for the components (TBox, ABox, RBox) of such Semantic approach Diverse reasoning engines Diverse query processing techniques Diverse query handling Diverse dataset size Diverse dataset complexity ◼ Missing an appropriate rule and query execution performance benchmark Expressiveness vs. performance Querying and reasoning over large scale building datasets: an outline of a performance benchmark 4 July 1st, 2016
  • 5. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Performance benchmark variables ◼ Main components ◼ These elements are implemented into 3 different systems SPIN (SPARQL Inference Notation) and Jena EYE Stardog ◼ An ensemble of queries is addressed to the so-created systems Schema (TBox) • ifcOWL Instances (ABox) • 369 ifcOWL- compliant building models Rules (RBox) • 68 data transformation rules Querying and reasoning over large scale building datasets: an outline of a performance benchmark 5 July 1st, 2016
  • 6. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be TBox - the ifcOWL ontology ◼ All building models are encoded using the ifcOWL ontology Built up under the impulse of numerous initiatives during the last 10 years ◼ The ontology used is the one that is made publicly available by the buildingSMART Linked Data Working Group (LDWG) http://ifcowl.openbimstandards.org/IFC4# http://ifcowl.openbimstandards.org/IFC4_ADD1# http://ifcowl.openbimstandards.org/IFC2X3_TC1# http://ifcowl.openbimstandards.org/IFC2X3_Final# Querying and reasoning over large scale building datasets: an outline of a performance benchmark 6 July 1st, 2016
  • 7. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Call for papers – special issue in SWJ ◼ Semantic Web Journal – Interoperability, Usability, Applicability http://www.semantic-web-journal.net ◼ Special issue on "Semantic Technologies and Interoperability in the Built Environment" ◼ Important dates March, 1st 2017 – paper submission deadline May 1st 2017 – notification of acceptance Ontologies for AEC/FM Linking BIM models to external data sources Multiple scale integration through semanitc interoperability Multilingual data access and annotation Query processing, query performance Semantic-based building monitoring systems Reasoning with building data Building data publication strategies Big Linked Data for building information Querying and reasoning over large scale building datasets: an outline of a performance benchmark 7 July 1st, 2016
  • 8. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be ifcOWL Stats July 1st, 2016 Querying and reasoning over large scale building datasets: an outline of a performance benchmark 8 Axioms 21306 Logical Axioms 13649 Classes 1230 Object properties 1578 Data properties 5 Individuals 1627 DL expressivity SROIQ(D) SubClassOf axioms 4622 EquivalentClasses axioms 266 DisjointClasses axioms 2429 SubObjectPropertyOf axioms 1 InverseObjectProperties axioms 94 FunctionalObjectProperty axioms 1441 TransitiveObjectProperty axioms 1 ObjectPropertyDomain axioms 1577 ObjectPropertyRange axioms 1576 FunctionalDataProperty axioms 5 DataPropertyDomain axioms 5 DataPropertyRange axioms 5 Pieter Pauwels and Walter Terkaj, EXPRESS to OWL for construction industry: towards a recommendable and usable ifcOWL ontology. Automation in Construction 63: 100-133 (2016).
  • 9. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be ABox – Building sets ◼ Some BIM models are publicly available (364), whereas other are undisclosed (5) Building information models created with different BIM modelling environments Exported to IFC2x3 Transformed into ifcOWL- compliant RDF graphs using a publicly available converter BIM environment Number of files Tekla Structures 227 (61,5%) unknown or manual 38 (10,3%) Autodesk Revit 27 (7,3%) Xella BIM 15 Autodesk AutoCAD 12 iTConcrete 9 SDS 8 Nemetschek AllPlan 7 GraphiSoft ArchiCAD 5 Various others 21 IFC instances Average file size Number of files 0 – 500,000 0 – 30 MB 321 500,000 – 2,000,000 30 – 100 MB 37 > 2,000,000 > 100 MB 11 Querying and reasoning over large scale building datasets: an outline of a performance benchmark 9 July 1st, 2016
  • 10. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be RBox – Data transformation rules ◼ Need for a representative set of rewrite rules ◼ 68 manually built rules ◼ Classified in several rule sets according to their content Rule Set (RS) Description RS1 Contains 2 rules for rewriting property set references into additional property statements sbd:hasPropertySet and sbd:hasProperty. This is a small, yet often used rule set that can be used in many contexts to simplify querying and data publication of common simple properties attached to IFC entity instances. RS2 Includes 31 rules, all involving subtypes of the IfcRelationship class (e.g. ifcowl:IfcRelAssigns, ifcowl:IfcRelDecomposes, ifcowl:IfcRelAssociates, ifcowl:IfcRelDefines, ifcowl:IfcRelConnects) RS3 Contains 3 rules related to handling lists in IFC. RS4 Contains one rule that allows wrapping simple data types. RS5 Consists of 20 rules for inferring single property statements sbd:hasPropertySet and sbd:hasProperty. RS6 Extends RS5 and RS1 with 6 additional rules for inferring whether an objet is internal or external to a building. RS7 Contains 7 rules dealing with the (de)composition of building spaces and spatial elements. Querying and reasoning over large scale building datasets: an outline of a performance benchmark 10 July 1st, 2016
  • 11. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be ifcOWL Example Transformation July 1st, 2016 Querying and reasoning over large scale building datasets: an outline of a performance benchmark 11 inst:IfcWindow_1893 inst:IfcWindow_1842 inst:IfcWallStandardCase_696 sbim:hasWindowsbim:hasWindow
  • 12. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Implementation • Implemented based on the open source APIs of Topbraid SPIN (SPIN API 1.4.0) and Apache Jena (Jena Core 2.11.0, Jena ARQ 2.11.0, Jena TDB 1.0.0) • Rules are written with Topbraid Composer Free version, and they are exported as RDF Turtle files. • A small Java program is implemented to read RDF models, schema, rules from the TDB store and query data. • All the SPARQL queries are configured using the Jena org.apache.jena.sparql.algebra package • To avoid unnecessary reasoning processes, in this test environment only the RDFS vocabulary is supported. SPIN + Jena TDB • Version ‘EYE- Winter16.0302.1557’ (‘SWI- Prolog 7.2.3 (amd64): Aug 25 2015, 12:24:59’). • EYE is a semi-backward reasoner enhanced with Euler path detection. • As our rule set currently contains only rules using =>, forward reasoning will take place. • Each command is executed 5 times • Each command includes the full ontology, the full set of rules and the RDFS vocabulary, as well as one of the 369 building model files and one of the 3 query files. • No triple store is used: triples are processed directly from the considered files. EYE • 4.0.2 Stardog semantic graph database (Java 8, RDF 1.1 graph data model, OWL2 profiles, SPARQL 1.1) • OWL reasoner + rule engine. • Support of SWRL rules, backward-chaining reasoning • Reasoning is performed by applying a query rewriting approach (SWRL rules are taken into account during the query rewriting process). • Stardog allows attaining a DL- expressivity level of SROIQ(D). • In this approach, SWRL rules are taken into account during the query rewriting process. Stardog Querying and reasoning over large scale building datasets: an outline of a performance benchmark 12 July 1st, 2016
  • 13. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Queries ◼ We have built a limited list of 60 queries, each of which triggers at least one of the available rules. ◼ As we focus here on query execution performance, the considered queries are entirely based on the right-hand sides of the considered rules. ◼ 3 queries: Q1 a simple query with little results, Q2 a simple query with many results, and Q3 a complex query that triggers a considerable number of rules Query Query Contents Q1 ?obj sbd:hasProperty ?p Q2 ?point sbd:hasCoordinateX ?x . ?point sbd:hasCoordinateY ?y . ?point sbd:hasCoordinateZ ?z Q3 ?d rdf:type sbd:ExternalWall Querying and reasoning over large scale building datasets: an outline of a performance benchmark 13 July 1st, 2016
  • 14. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Test environment ◼ In one central server Supplied by the University of Burgundy, research group CheckSem, Following specifications: Ubuntu OS, Intel Xeon CPU E5-2430 at 2.2GHz, 6 cores and 16GB of DDR3 RAM memory ◼ 3 Virtual Machines (VMs) were set up in this central server SPIN VM (Jena TDB), EYE VM (EYE inference engine), Stardog VM (Stardog triplestore) ◼ The VMs were managed as separate test environments and Each of these VMs had 2 cores out of 6 allocated Each contained the above resources (ontologies, data, rules, queries). Querying and reasoning over large scale building datasets: an outline of a performance benchmark 14 July 1st, 2016
  • 15. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Results ◼ Queries applied on 6 hand- picked building models of varying size ◼ In the SPIN approach For Q1 and Q2, the execution time = backward- chaining inference process + actual query execution time For Q3, execution time = query execution time itself ◼ In the EYE approach Networking time is ignored ◼ In the Stardog approach Execution time = backward- chaining inference + actual query execution time Query Building Model SPIN (s) EYE (s) Stardog (s) Q1 (simple, little results) BM1 135,36 37,11 13,44 BM2 1,47 0,29 0,17 BM3 24,01 4,87 1,4 BM4 41,28 12,95 3,55 BM5 4,99 1,05 0,33 BM6 0,55 0,16 0,08 Q2 (simple, many results) BM1 46,17 2,10 6,82 BM2 92,03 4,20 15,83 BM3 82,68 4,12 15,28 BM4 19,93 1,04 2,81 BM5 3,69 0,21 1,36 BM6 0,74 0,045 1,00 Q3 (complex) BM1 0,001 0,001 0,07 BM2 0,006 0,003 0,12 BM3 0,002 0,003 0,31 BM4 0,005 0,001 0,20 BM5 0,006 0,013 0,20 BM6 0,001 0,001 0,13 Querying and reasoning over large scale building datasets: an outline of a performance benchmark 15 July 1st, 2016
  • 16. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Query time related to result count For Q1 for each of the considered approaches (green = SPIN; blue = EYE; black = Stardog) For Q2 for each of the considered approaches (green = SPIN; blue = EYE; black = Stardog) Querying and reasoning over large scale building datasets: an outline of a performance benchmark 16 July 1st, 2016
  • 17. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Additional findings • The three considered procedures are quite far apart from each other, explaining the considerable performance differences, not only between the procedures, but also between diverse usages within one and the same system. • Algorithms and optimization techniques used for each approach aren't entirely used: differences in indexation algorithms, query rewriting techniques and rule handling strategies used. Indexing algorithms, query rewriting techniques, and rule handling strategies • The disadvantage of forward-chaining reasoning process is that millions of triples can be materialized (EYE, SPIN for Q1 and Q2) • Using backward-chaining reasoning allows avoiding triple materialization, thus saving query execution time (Stardog, SPIN for Q3). Forward- versus backward-chaining • Query Q3 triggers a rule that in turn triggers several other rules in the rule set. If the first rule does not fire, however, the process stops early. • Query Q2, however, fires relatively long rules. It takes more time to make these matches in all three approaches. Type of data in the building model • Loading files in memory at query execution time leads to considerable delays. Impact of the triple store • Linear relation: the more results are available, the more triples need to be matched, leading to more assertions. Impact of the number of output results Querying and reasoning over large scale building datasets: an outline of a performance benchmark 17 July 1st, 2016
  • 18. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Conclusion and future work ◼ Comparison of 3 different approaches SPIN, EYE and Stardog ◼ 3 queries applied over 6 different building models ◼ Future work consists in Specifying more this initial performance benchmark with additional data and rules Executing additional queries on the rest of the set of building models Comparing results on a wider scale: ― for the individual approaches separately, ― as well as with other approaches not considered here. Querying and reasoning over large scale building datasets: an outline of a performance benchmark 18 July 1st, 2016
  • 19. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Thank you for your attention. Pieter Pauwels, Tarcisio Mendes de Farias, Chi Zhang, Ana Roxin, Jakob Beetz, Jos De Roo, Christophe Nicolle International Workshop on Semantic Big Data (SBD 2016) in conjunction with the 2016 ACM SIGMOD Conference in San Francisco, USA
  • 20. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Query Building Model SPIN EYE Stardog Q1 BM1 135,36 37,11 13,44 BM2 1,47 0,29 0,17 BM3 24,01 4,87 1,4 BM4 41,28 12,95 3,55 BM5 4,99 1,05 0,33 BM6 0,55 0,16 0,08 Q2 BM1 46,17 2,10 6,82 BM2 92,03 4,20 15,83 BM3 82,68 4,12 15,28 BM4 19,93 1,04 2,81 BM5 3,69 0,21 1,36 BM6 0,74 0,045 1,00 Q3 BM1 0,001 0,001 0,07 BM2 0,006 0,003 0,12 BM3 0,002 0,003 0,31 BM4 0,005 0,001 0,20 BM5 0,006 0,013 0,20 BM6 0,001 0,001 0,13 Querying and reasoning over large scale building datasets: an outline of a performance benchmark 20 July 1st, 2016