SlideShare une entreprise Scribd logo
1  sur  19
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Querying and reasoning over
large scale building datasets: an outline of
a performance benchmark
Pieter Pauwels, Tarcisio Mendes deFarias, ChiZhang, Ana Roxin, Jakob Beetz, Jos De Roo,
Christophe Nicolle
International Workshop on Semantic Big Data (SBD 2016)
in conjunction with the 2016 ACM SIGMOD Conference in San Francisco, USA
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Agenda
Introduction
• Contextdescription
• Problemidentified
Testing environment
• ifcOWLandbuildingmodels
• Rulesandqueries
• Triplestores
Results
• Query performance
• Additionalfindings
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
2
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Context description
◼ The architecturaldesign andconstructiondomainsworkonadailybasis withmassiveamountsof
data.
◼ In the contextofBIM, aneutral,interoperablerepresentationofinformationconsistsin the Industry
FoundationClasses(IFC) standard
 Difficult to handlethe EXPRESS format
◼ SemanticWebtechnologies havebeen identifiedas apossible solution
 Semantic data enrichment
 Schema and data transformations
◼ A semanticapproachinvolves 3 maincomponents:
Schema (Tbox)
• OWL ontology
• Informationstructure
Instances (ABox)
• Assertions
• Respectsschema definition
Rules(RBox)
• If-Thenstatements
• Involving elementsfrom the
ABoxandtheTBox
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
3
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Problem identified
◼ Differentimplementationsexist forthecomponents(TBox,ABox,RBox)ofsuch Semanticapproach
 Diverse reasoning engines
 Diverse queryprocessing techniques
 Diverse queryhandling
 Diverse dataset size
 Diverse dataset complexity
◼ Missing anappropriaterule andqueryexecutionperformancebenchmark
Expressiveness vs.
performance
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
4
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Performance benchmark variables
◼ Main components
◼ Theseelements areimplemented into3differentsystems
 SPIN(SPARQL InferenceNotation) and Jena
 EYE
 Stardog
◼ An ensemble ofqueries isaddressedtotheso-createdsystems
Schema
(TBox)
• ifcOWL
Instances(ABox)
• 369ifcOWL-
compliantbuilding
models
Rules
(RBox)
• 68 data
transformationrules
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
5
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
TBox - the ifcOWL ontology
◼ All building modelsareencoded usingtheifcOWLontology
 Built up underthe impulse of numerousinitiatives during the last 10years
◼ The ontologyused isthe onethatis madepublicly availablebythe buildingSMARTLinked Data
Working Group(LDWG)
 http://ifcowl.openbimstandards.org/IFC4#
 http://ifcowl.openbimstandards.org/IFC4_ADD1#
 http://ifcowl.openbimstandards.org/IFC2X3_TC1#
 http://ifcowl.openbimstandards.org/IFC2X3_Final#
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
6
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
ifcOWL Stats
July1st, 2016 Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
7
Axioms 21306
Logical Axioms 13649
Classes 1230
Object properties 1578
Data properties 5
Individuals 1627
DL expressivity SROIQ(D)
SubClassOf axioms 4622
EquivalentClasses axioms 266
DisjointClasses axioms 2429
SubObjectPropertyOf axioms 1
InverseObjectProperties axioms 94
FunctionalObjectProperty axioms 1441
TransitiveObjectProperty axioms 1
ObjectPropertyDomain axioms 1577
ObjectPropertyRange axioms 1576
FunctionalDataProperty axioms 5
DataPropertyDomain axioms 5
DataPropertyRange axioms 5
Pieter Pauwels and Walter Terkaj, EXPRESS to OWL for
construction industry: towards a recommendable and usable
ifcOWL ontology. Automation in Construction 63: 100-133 (2016).
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Call for papers – special issue in SWJ
◼ SemanticWebJournal– Interoperability,Usability,Applicability
 http://www.semantic-web-journal.net
◼ Specialissue on"SemanticTechnologies andInteroperability intheBuilt Environment"
◼ Importantdates
 March, 1st 2017– paper submission deadline
 May 1st 2017– notification of acceptance
OntologiesforAEC/FM
LinkingBIM modelsto
externaldatasources
Multiplescaleintegration
throughsemanitc
interoperability
Multilingualdataaccess
andannotation
Queryprocessing,query
performance
Semantic-basedbuilding
monitoringsystems
Reasoningwithbuilding
data
Buildingdatapublication
strategies
BigLinkedDatafor
buildinginformation
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
8
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
ABox – Building sets
◼ SomeBIMmodelsarepubliclyavailable(364), whereasotherareundisclosed (5)
Buildinginformationmodelscreatedwith
differentBIM modellingenvironments
Exported to IFC2x3
Transformed intoifcOWL-compliantRDF
graphs usinga publiclyavailableconverter
BIM environment Number of files
TeklaStructures 227(61,5%)
unknownor manual 38 (10,3%)
AutodeskRevit 27 (7,3%)
XellaBIM 15
AutodeskAutoCAD 12
iTConcrete 9
SDS 8
NemetschekAllPlan 7
GraphiSoftArchiCAD 5
Variousothers 21
IFC instances Average file size
0 – 500,000 0 – 30 MB
500,000– 2,000,000 30 – 100MB
> 2,000,000 > 100MB
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
9
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
RBox – Data transformation rules
◼ Need forarepresentativesetof rewriterules
◼ 68manuallybuiltrules
◼ Classified in several rule setsaccordingtotheir content
Rule Set
(RS)
Description
RS1
Contains2 rulesforrewritingpropertysetreferencesintoadditionalpropertystatements sbd:hasPropertySet and
sbd:hasProperty. Thisis a small, yetoftenusedrulesetthat can beusedin manycontextsto simplifyqueryinganddatapublicationof common
simplepropertiesattachedtoIFC entityinstances.
RS2
Includes31 rules,allinvolving subtypesofthe IfcRelationship class(e.g. ifcowl:IfcRelAssigns,
ifcowl:IfcRelDecomposes, ifcowl:IfcRelAssociates, ifcowl:IfcRelDefines,
ifcowl:IfcRelConnects)
RS3 Contains3 rulesrelatedto handlinglistsin IFC.
RS4 Containsone rulethat allowswrappingsimpledatatypes.
RS4 Consistsof20 rulesforinferring singlepropertystatements sbd:hasPropertySet andsbd:hasProperty.
RS6 ExtendsRS5 andRS1with 6 additionalrulesforinferringwhetheranobjetis internalorexternalto a building.
RS7 Contains7 rulesdealingwith the (de)compositionof buildingspacesandspatialelements.
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
10
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
ifcOWL Example Transformation
July1st, 2016 Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
11
inst:IfcWindow_1893 inst:IfcWindow_1842
inst:IfcWallStandardCase_696
sbim:hasWindowsbim:hasWindow
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Implementation
• Implementedbased ontheopen source APIs
of Topbraid SPIN(SPIN API 1.4.0)and Apache
Jena (Jena Core 2.11.0,Jena ARQ2.11.0,Jena
TDB 1.0.0)
• Rulesare writtenwith TopbraidComposer
Freeversion,and theyare exportedas RDF
Turtlefiles.
• A smallJava program isimplementedto read
RDF models,schema,rulesfrom the TDB store
and query data.
• AlltheSPARQLqueriesareconfigured using
theJena org.apache.jena.sparql.algebra
package
• To avoid unnecessaryreasoningprocesses,in
thistestenvironment onlythe RDFS
vocabulary issupported.
SPIN+ JenaTDB
• Version‘EYE-Winter16.0302.1557’(‘SWI-
Prolog 7.2.3 (amd64):Aug 252015,
12:24:59’).
• EYE isa semi-backwardreasonerenhanced
with Eulerpath detection.
• Asour rulesetcurrently containsonly rules
using=>, forward reasoningwilltake place.
• Each command is executed5 times
• Each command includesthefullontology, the
fullsetof rulesand theRDFS vocabulary, as
wellasone of the 369buildingmodel filesand
one of the3 query files.
• Notriplestoreisused:triplesare processed
directlyfrom theconsideredfiles.
EYE
• 4.0.2Stardogsemanticgraphdatabase(Java 8,
RDF 1.1graph datamodel, OWL2profiles,
SPARQL1.1)
• OWL reasoner+rule engine.
• Support of SWRLrules,backward-chaining
reasoning
• Reasoningis performedby applyinga query
rewritingapproach (SWRLrulesare taken into
account during thequery rewritingprocess).
• Stardogallowsattaininga DL-expressivity
levelof SROIQ(D).
• Inthisapproach, SWRLrulesare taken into
account during thequery rewritingprocess.
Stardog
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
12
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Queries
◼ We have builta limited listof60 queries,eachofwhichtriggersatleast oneof theavailablerules.
◼ As we focushereon queryexecutionperformance,the consideredqueriesareentirelybasedon the
right-handsides ofthe consideredrules.
◼ 3 queries:
 Q1 a simple querywith little results,
 Q2 a simple querywith manyresults,
 and Q3 a complex querythat triggers a considerable numberof rules
Query Query Contents
Q1 ?obj sbd:hasProperty ?p
Q2
?point sbd:hasCoordinateX ?x .
?point sbd:hasCoordinateY ?y .
?point sbd:hasCoordinateZ ?z
Q3 ?d rdf:type sbd:ExternalWall
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
13
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Test environment
◼ In onecentralserver
 Supplied by the Universityof Burgundy,researchgroup CheckSem,
 Following specifications: UbuntuOS, Intel Xeon CPU E5-2430at 2.2GHz,6 coresand 16GB of DDR3
RAMmemory
◼ 3 VirtualMachines(VMs) wereset up in thiscentralserver
 SPIN VM (Jena TDB), EYE VM(EYE inferenceengine), Stardog VM(Stardog triplestore)
◼ The VMsweremanagedas separatetestenvironments and
 Each of these VMs had 2 coresout of 6 allocated
 Each containedthe above resources (ontologies, data, rules, queries).
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
14
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Results
◼ Queriesappliedon 6hand-picked
building models ofvaryingsize
◼ In theSPINapproach
 For Q1 and Q2, the execution time =
backward-chaininginference process +
actual query execution time
 For Q3, execution time = queryexecution
time itself
◼ In the EYEapproach
 Networktraffic time is ignored
◼ In the Stardogapproach
 Execution time = backward-chaining
inference+ actual queryexecution time
Query
Building
Model
SPIN
(s)
EYE
(s)
Stardog(s)
Q1
(simple,little
results)
BM1 135,36 37,11 13,44
BM2 1,47 0,29 0,17
BM3 24,01 4,87 1,4
BM4 41,28 12,95 3,55
BM5 4,99 1,05 0,33
BM6 0,55 0,16 0,08
Q2
(simple,many
results)
BM1 46,17 2,10 6,82
BM2 92,03 4,20 15,83
BM3 82,68 4,12 15,28
BM4 19,93 1,04 2,81
BM5 3,69 0,21 1,36
BM6 0,74 0,045 1,00
Q3
(complex)
BM1 0,001 0,001 0,07
BM2 0,006 0,003 0,12
BM3 0,002 0,003 0,31
BM4 0,005 0,001 0,20
BM5 0,006 0,013 0,20
BM6 0,001 0,001 0,13
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
15
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Query time related to result count
For Q1 for each of the considered
approaches
(green = SPIN; blue = EYE; black = Stardog)
For Q2 for each of the considered
approaches
(green = SPIN; blue = EYE; black = Stardog)
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
16
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Additional findings
• The three considered procedures arequite farapartfrom each other, explaining the considerable performance differences, not only between the
procedures, but alsobetween diverse usages within one and the samesystem.
• Algorithms and optimization techniques used for each approach aren't entirely used: differences in indexation algorithms, query rewriting techniques
and rule handling strategies used.
Indexing algorithms, query rewriting techniques, and rule handling strategies
• The disadvantage of forward-chaining reasoning process is that millions of triples can bematerialized (EYE, SPIN forQ1 and Q2)
• Using backward-chaining reasoning allows avoiding triple materialization, thus saving query execution time (Stardog, SPIN forQ3).
Forward- versus backward-chaining
• Query Q3 triggers a rule that in turn triggers several other rules in the rule set. If the firstrule does not fire, however, the process stops early.
• Query Q2, however, fires relatively long rules. It takes more time to make these matches in all three approaches.
Typeof data in the building model
• Loading files in memory at query execution time leads to considerable delays.
Impact of the triple store
• Linear relation: the more results areavailable, the more triples need to bematched, leading to more assertions.
Impact of the number of output results
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
17
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Conclusion and future work
◼ Comparisonof3 differentapproaches
 SPIN, EYE andStardog
◼ 3 queriesappliedover 6 differentbuilding models
◼ Futureworkconsistsin
 Specifying morethis initial performancebenchmarkwith additional data and rules
 Executing additional queries on the rest of the set of building models
 Comparingresults ona wider scale:
―forthe individual approaches separately,
―as well as with other approaches not considered here.
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
18
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Thank you for your attention.
Pieter Pauwels, Tarcisio Mendes deFarias, ChiZhang,
Ana Roxin, JakobBeetz, Jos De Roo, Christophe Nicolle
International Workshop on Semantic Big Data (SBD 2016)
in conjunction with the 2016 ACM SIGMOD Conference in San Francisco, USA

Contenu connexe

Tendances

SustainablePlaces_ifcOWL_applications_2015-09-17
SustainablePlaces_ifcOWL_applications_2015-09-17SustainablePlaces_ifcOWL_applications_2015-09-17
SustainablePlaces_ifcOWL_applications_2015-09-17
Pieter Pauwels
 
CH2009 - Architectural information modelling in construction history
CH2009 - Architectural information modelling in construction historyCH2009 - Architectural information modelling in construction history
CH2009 - Architectural information modelling in construction history
Pieter Pauwels
 

Tendances (20)

LDAC Workshop 2016 - Linked Building Data Community Efforts
LDAC Workshop 2016 - Linked Building Data Community EffortsLDAC Workshop 2016 - Linked Building Data Community Efforts
LDAC Workshop 2016 - Linked Building Data Community Efforts
 
ECPPM2016 - SemCat: Publishing and Accessing Building Product Information as ...
ECPPM2016 - SemCat: Publishing and Accessing Building Product Information as ...ECPPM2016 - SemCat: Publishing and Accessing Building Product Information as ...
ECPPM2016 - SemCat: Publishing and Accessing Building Product Information as ...
 
ifcOWL - An ontology for building data
ifcOWL - An ontology for building dataifcOWL - An ontology for building data
ifcOWL - An ontology for building data
 
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...
 
SustainablePlaces_ifcOWL_applications_2015-09-17
SustainablePlaces_ifcOWL_applications_2015-09-17SustainablePlaces_ifcOWL_applications_2015-09-17
SustainablePlaces_ifcOWL_applications_2015-09-17
 
SWIMing VoCamp 2016 - ifcOWL overview and current state
SWIMing VoCamp 2016 - ifcOWL overview and current stateSWIMing VoCamp 2016 - ifcOWL overview and current state
SWIMing VoCamp 2016 - ifcOWL overview and current state
 
CIB W78 2015 - Semantic Rule-checking for Regulation Compliance Checking
CIB W78 2015 - Semantic Rule-checking for Regulation Compliance CheckingCIB W78 2015 - Semantic Rule-checking for Regulation Compliance Checking
CIB W78 2015 - Semantic Rule-checking for Regulation Compliance Checking
 
BIM from Building to urban fabric: More than just zooming out
BIM from Building to urban fabric: More than just zooming outBIM from Building to urban fabric: More than just zooming out
BIM from Building to urban fabric: More than just zooming out
 
IoT Reference Architectures
IoT Reference ArchitecturesIoT Reference Architectures
IoT Reference Architectures
 
CAA NLFL 2015 - Semantics in the documentation of architectural heritage: BIM...
CAA NLFL 2015 - Semantics in the documentation of architectural heritage: BIM...CAA NLFL 2015 - Semantics in the documentation of architectural heritage: BIM...
CAA NLFL 2015 - Semantics in the documentation of architectural heritage: BIM...
 
CORE final workshop introduction
CORE final workshop introductionCORE final workshop introduction
CORE final workshop introduction
 
Automated Reverse-Engineering of a Cloud API
Automated Reverse-Engineering of a Cloud APIAutomated Reverse-Engineering of a Cloud API
Automated Reverse-Engineering of a Cloud API
 
PechaKucha (FormaliSE'2018)
PechaKucha (FormaliSE'2018)PechaKucha (FormaliSE'2018)
PechaKucha (FormaliSE'2018)
 
CH2009 - Architectural information modelling in construction history
CH2009 - Architectural information modelling in construction historyCH2009 - Architectural information modelling in construction history
CH2009 - Architectural information modelling in construction history
 
MoDMaCAO: Model-Driven Configuration Management of Cloud Applications with OC...
MoDMaCAO: Model-Driven Configuration Management of Cloud Applications with OC...MoDMaCAO: Model-Driven Configuration Management of Cloud Applications with OC...
MoDMaCAO: Model-Driven Configuration Management of Cloud Applications with OC...
 
Crossing the chasm between ontology engineering and application development
Crossing the chasm between ontology engineering and application developmentCrossing the chasm between ontology engineering and application development
Crossing the chasm between ontology engineering and application development
 
h5web: a web-based viewer of HDF5 files
h5web: a web-based viewer of HDF5 filesh5web: a web-based viewer of HDF5 files
h5web: a web-based viewer of HDF5 files
 
VR and AR in Logistics: VDC-Whitepaper
VR and AR in Logistics: VDC-WhitepaperVR and AR in Logistics: VDC-Whitepaper
VR and AR in Logistics: VDC-Whitepaper
 
A Precise Model for Google Cloud Platform (IC2E'2018)
A Precise Model for Google Cloud Platform (IC2E'2018)A Precise Model for Google Cloud Platform (IC2E'2018)
A Precise Model for Google Cloud Platform (IC2E'2018)
 
ICN in the IRTF and IETF
ICN in the IRTF and IETFICN in the IRTF and IETF
ICN in the IRTF and IETF
 

En vedette

2_presFriday_ontologydevelopment
2_presFriday_ontologydevelopment2_presFriday_ontologydevelopment
2_presFriday_ontologydevelopment
Pieter Pauwels
 

En vedette (14)

VLDB2015 会議報告
VLDB2015 会議報告VLDB2015 会議報告
VLDB2015 会議報告
 
The Art of Performance Evaluation
The Art of Performance EvaluationThe Art of Performance Evaluation
The Art of Performance Evaluation
 
Publish and use your data
Publish and use your dataPublish and use your data
Publish and use your data
 
CIB W78 2015 - Keynote "The Web of Construction Data:Pathways and Opportunities"
CIB W78 2015 - Keynote "The Web of Construction Data:Pathways and Opportunities"CIB W78 2015 - Keynote "The Web of Construction Data:Pathways and Opportunities"
CIB W78 2015 - Keynote "The Web of Construction Data:Pathways and Opportunities"
 
Data Interlinking
Data InterlinkingData Interlinking
Data Interlinking
 
LDAC 2015 - Towards an industry-wide ifcOWL: choices and issues
LDAC 2015 - Towards an industry-wide ifcOWL: choices and issuesLDAC 2015 - Towards an industry-wide ifcOWL: choices and issues
LDAC 2015 - Towards an industry-wide ifcOWL: choices and issues
 
Semantics for Smarter Cities
Semantics for Smarter CitiesSemantics for Smarter Cities
Semantics for Smarter Cities
 
2_presFriday_ontologydevelopment
2_presFriday_ontologydevelopment2_presFriday_ontologydevelopment
2_presFriday_ontologydevelopment
 
The SWIMing project
The SWIMing projectThe SWIMing project
The SWIMing project
 
LDAC 2015 - Selection of IFC subsets using ifcOWL and rewrite rules
LDAC 2015 - Selection of IFC subsets using ifcOWL and rewrite rulesLDAC 2015 - Selection of IFC subsets using ifcOWL and rewrite rules
LDAC 2015 - Selection of IFC subsets using ifcOWL and rewrite rules
 
BIMMeeting 2016 - BIM-Infra-GIS: building bridges from single buildings to di...
BIMMeeting 2016 - BIM-Infra-GIS: building bridges from single buildings to di...BIMMeeting 2016 - BIM-Infra-GIS: building bridges from single buildings to di...
BIMMeeting 2016 - BIM-Infra-GIS: building bridges from single buildings to di...
 
Smart Cities and Open Data
Smart Cities and Open DataSmart Cities and Open Data
Smart Cities and Open Data
 
Smart cities and open data platforms
Smart cities and open data platformsSmart cities and open data platforms
Smart cities and open data platforms
 
Manufacturers Data Moving from PDF's to digital
Manufacturers Data Moving from PDF's to digitalManufacturers Data Moving from PDF's to digital
Manufacturers Data Moving from PDF's to digital
 

Similaire à ACM SIGMOD SBD2016 - Querying and reasoning over large scale building datasets: an outline of a performance benchmark

An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
Shiyong Lu
 

Similaire à ACM SIGMOD SBD2016 - Querying and reasoning over large scale building datasets: an outline of a performance benchmark (20)

Querying and reasoning over large scale building datasets: an outline of a pe...
Querying and reasoning over large scale building datasets: an outline of a pe...Querying and reasoning over large scale building datasets: an outline of a pe...
Querying and reasoning over large scale building datasets: an outline of a pe...
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
ifcWOD (Web Of Data) - Semantically Adapting IFC Model Relations into OWL Pro...
ifcWOD (Web Of Data) - Semantically Adapting IFC Model Relations into OWL Pro...ifcWOD (Web Of Data) - Semantically Adapting IFC Model Relations into OWL Pro...
ifcWOD (Web Of Data) - Semantically Adapting IFC Model Relations into OWL Pro...
 
OA centre of excellence
OA centre of excellenceOA centre of excellence
OA centre of excellence
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to production
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
 
WSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product OverviewWSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product Overview
 
Towards Configuration Technologies for IoT Gateways
Towards Configuration Technologies  for IoT GatewaysTowards Configuration Technologies  for IoT Gateways
Towards Configuration Technologies for IoT Gateways
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
 
Telefonica: Automatización de la gestión de redes mediante grafos
Telefonica: Automatización de la gestión de redes mediante grafosTelefonica: Automatización de la gestión de redes mediante grafos
Telefonica: Automatización de la gestión de redes mediante grafos
 
Diane Van Etten_resume
Diane Van Etten_resumeDiane Van Etten_resume
Diane Van Etten_resume
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
 
Pathways for EOSC-hub and MaX collaboration
Pathways for EOSC-hub and MaX collaborationPathways for EOSC-hub and MaX collaboration
Pathways for EOSC-hub and MaX collaboration
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
 
Spark meetup v2.0.5
Spark meetup v2.0.5Spark meetup v2.0.5
Spark meetup v2.0.5
 
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland LeusdenTestistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
 

Plus de Pieter Pauwels

EuropIA 2014 - Analysing the impact of constraints on decision-making by arch...
EuropIA 2014 - Analysing the impact of constraints on decision-making by arch...EuropIA 2014 - Analysing the impact of constraints on decision-making by arch...
EuropIA 2014 - Analysing the impact of constraints on decision-making by arch...
Pieter Pauwels
 

Plus de Pieter Pauwels (10)

FOMI2017 - A method to generate a modular ifcOWL ontology
FOMI2017 - A method to generate a modular ifcOWL ontologyFOMI2017 - A method to generate a modular ifcOWL ontology
FOMI2017 - A method to generate a modular ifcOWL ontology
 
FOMI2017 - Reusing Domain Ontologies in Linked Building Data: the Case of Bui...
FOMI2017 - Reusing Domain Ontologies in Linked Building Data: the Case of Bui...FOMI2017 - Reusing Domain Ontologies in Linked Building Data: the Case of Bui...
FOMI2017 - Reusing Domain Ontologies in Linked Building Data: the Case of Bui...
 
EG-ICE 2015 - Coping with IFC lists in the ifcOWL ontology
EG-ICE 2015 - Coping with IFC lists in the ifcOWL ontologyEG-ICE 2015 - Coping with IFC lists in the ifcOWL ontology
EG-ICE 2015 - Coping with IFC lists in the ifcOWL ontology
 
CAADFutures 2015 - Shape grammars for architectural design: the need for refr...
CAADFutures 2015 - Shape grammars for architectural design: the need for refr...CAADFutures 2015 - Shape grammars for architectural design: the need for refr...
CAADFutures 2015 - Shape grammars for architectural design: the need for refr...
 
Summer School LD4SC 2015 - ifcOWL introduction
Summer School LD4SC 2015 - ifcOWL introductionSummer School LD4SC 2015 - ifcOWL introduction
Summer School LD4SC 2015 - ifcOWL introduction
 
Summer School LD4SC 2015 - RDF(S) and SPARQL
Summer School LD4SC 2015 - RDF(S) and SPARQLSummer School LD4SC 2015 - RDF(S) and SPARQL
Summer School LD4SC 2015 - RDF(S) and SPARQL
 
EuropIA 2014 - Analysing the impact of constraints on decision-making by arch...
EuropIA 2014 - Analysing the impact of constraints on decision-making by arch...EuropIA 2014 - Analysing the impact of constraints on decision-making by arch...
EuropIA 2014 - Analysing the impact of constraints on decision-making by arch...
 
ECPPM2014 - Making SimModel information available as RDF graphs
ECPPM2014 - Making SimModel information available as RDF graphsECPPM2014 - Making SimModel information available as RDF graphs
ECPPM2014 - Making SimModel information available as RDF graphs
 
iKNOW2014 - SimModel and IFC: a short introduction to the ontologies
iKNOW2014 - SimModel and IFC: a short introduction to the ontologiesiKNOW2014 - SimModel and IFC: a short introduction to the ontologies
iKNOW2014 - SimModel and IFC: a short introduction to the ontologies
 
NordDesign2014 - Reasoning processes involved in ICT-mediated design communic...
NordDesign2014 - Reasoning processes involved in ICT-mediated design communic...NordDesign2014 - Reasoning processes involved in ICT-mediated design communic...
NordDesign2014 - Reasoning processes involved in ICT-mediated design communic...
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

ACM SIGMOD SBD2016 - Querying and reasoning over large scale building datasets: an outline of a performance benchmark

  • 1. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Querying and reasoning over large scale building datasets: an outline of a performance benchmark Pieter Pauwels, Tarcisio Mendes deFarias, ChiZhang, Ana Roxin, Jakob Beetz, Jos De Roo, Christophe Nicolle International Workshop on Semantic Big Data (SBD 2016) in conjunction with the 2016 ACM SIGMOD Conference in San Francisco, USA
  • 2. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Agenda Introduction • Contextdescription • Problemidentified Testing environment • ifcOWLandbuildingmodels • Rulesandqueries • Triplestores Results • Query performance • Additionalfindings Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 2 July1st, 2016
  • 3. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Context description ◼ The architecturaldesign andconstructiondomainsworkonadailybasis withmassiveamountsof data. ◼ In the contextofBIM, aneutral,interoperablerepresentationofinformationconsistsin the Industry FoundationClasses(IFC) standard  Difficult to handlethe EXPRESS format ◼ SemanticWebtechnologies havebeen identifiedas apossible solution  Semantic data enrichment  Schema and data transformations ◼ A semanticapproachinvolves 3 maincomponents: Schema (Tbox) • OWL ontology • Informationstructure Instances (ABox) • Assertions • Respectsschema definition Rules(RBox) • If-Thenstatements • Involving elementsfrom the ABoxandtheTBox Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 3 July1st, 2016
  • 4. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Problem identified ◼ Differentimplementationsexist forthecomponents(TBox,ABox,RBox)ofsuch Semanticapproach  Diverse reasoning engines  Diverse queryprocessing techniques  Diverse queryhandling  Diverse dataset size  Diverse dataset complexity ◼ Missing anappropriaterule andqueryexecutionperformancebenchmark Expressiveness vs. performance Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 4 July1st, 2016
  • 5. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Performance benchmark variables ◼ Main components ◼ Theseelements areimplemented into3differentsystems  SPIN(SPARQL InferenceNotation) and Jena  EYE  Stardog ◼ An ensemble ofqueries isaddressedtotheso-createdsystems Schema (TBox) • ifcOWL Instances(ABox) • 369ifcOWL- compliantbuilding models Rules (RBox) • 68 data transformationrules Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 5 July1st, 2016
  • 6. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be TBox - the ifcOWL ontology ◼ All building modelsareencoded usingtheifcOWLontology  Built up underthe impulse of numerousinitiatives during the last 10years ◼ The ontologyused isthe onethatis madepublicly availablebythe buildingSMARTLinked Data Working Group(LDWG)  http://ifcowl.openbimstandards.org/IFC4#  http://ifcowl.openbimstandards.org/IFC4_ADD1#  http://ifcowl.openbimstandards.org/IFC2X3_TC1#  http://ifcowl.openbimstandards.org/IFC2X3_Final# Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 6 July1st, 2016
  • 7. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be ifcOWL Stats July1st, 2016 Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 7 Axioms 21306 Logical Axioms 13649 Classes 1230 Object properties 1578 Data properties 5 Individuals 1627 DL expressivity SROIQ(D) SubClassOf axioms 4622 EquivalentClasses axioms 266 DisjointClasses axioms 2429 SubObjectPropertyOf axioms 1 InverseObjectProperties axioms 94 FunctionalObjectProperty axioms 1441 TransitiveObjectProperty axioms 1 ObjectPropertyDomain axioms 1577 ObjectPropertyRange axioms 1576 FunctionalDataProperty axioms 5 DataPropertyDomain axioms 5 DataPropertyRange axioms 5 Pieter Pauwels and Walter Terkaj, EXPRESS to OWL for construction industry: towards a recommendable and usable ifcOWL ontology. Automation in Construction 63: 100-133 (2016).
  • 8. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Call for papers – special issue in SWJ ◼ SemanticWebJournal– Interoperability,Usability,Applicability  http://www.semantic-web-journal.net ◼ Specialissue on"SemanticTechnologies andInteroperability intheBuilt Environment" ◼ Importantdates  March, 1st 2017– paper submission deadline  May 1st 2017– notification of acceptance OntologiesforAEC/FM LinkingBIM modelsto externaldatasources Multiplescaleintegration throughsemanitc interoperability Multilingualdataaccess andannotation Queryprocessing,query performance Semantic-basedbuilding monitoringsystems Reasoningwithbuilding data Buildingdatapublication strategies BigLinkedDatafor buildinginformation Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 8 July1st, 2016
  • 9. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be ABox – Building sets ◼ SomeBIMmodelsarepubliclyavailable(364), whereasotherareundisclosed (5) Buildinginformationmodelscreatedwith differentBIM modellingenvironments Exported to IFC2x3 Transformed intoifcOWL-compliantRDF graphs usinga publiclyavailableconverter BIM environment Number of files TeklaStructures 227(61,5%) unknownor manual 38 (10,3%) AutodeskRevit 27 (7,3%) XellaBIM 15 AutodeskAutoCAD 12 iTConcrete 9 SDS 8 NemetschekAllPlan 7 GraphiSoftArchiCAD 5 Variousothers 21 IFC instances Average file size 0 – 500,000 0 – 30 MB 500,000– 2,000,000 30 – 100MB > 2,000,000 > 100MB Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 9 July1st, 2016
  • 10. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be RBox – Data transformation rules ◼ Need forarepresentativesetof rewriterules ◼ 68manuallybuiltrules ◼ Classified in several rule setsaccordingtotheir content Rule Set (RS) Description RS1 Contains2 rulesforrewritingpropertysetreferencesintoadditionalpropertystatements sbd:hasPropertySet and sbd:hasProperty. Thisis a small, yetoftenusedrulesetthat can beusedin manycontextsto simplifyqueryinganddatapublicationof common simplepropertiesattachedtoIFC entityinstances. RS2 Includes31 rules,allinvolving subtypesofthe IfcRelationship class(e.g. ifcowl:IfcRelAssigns, ifcowl:IfcRelDecomposes, ifcowl:IfcRelAssociates, ifcowl:IfcRelDefines, ifcowl:IfcRelConnects) RS3 Contains3 rulesrelatedto handlinglistsin IFC. RS4 Containsone rulethat allowswrappingsimpledatatypes. RS4 Consistsof20 rulesforinferring singlepropertystatements sbd:hasPropertySet andsbd:hasProperty. RS6 ExtendsRS5 andRS1with 6 additionalrulesforinferringwhetheranobjetis internalorexternalto a building. RS7 Contains7 rulesdealingwith the (de)compositionof buildingspacesandspatialelements. Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 10 July1st, 2016
  • 11. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be ifcOWL Example Transformation July1st, 2016 Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 11 inst:IfcWindow_1893 inst:IfcWindow_1842 inst:IfcWallStandardCase_696 sbim:hasWindowsbim:hasWindow
  • 12. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Implementation • Implementedbased ontheopen source APIs of Topbraid SPIN(SPIN API 1.4.0)and Apache Jena (Jena Core 2.11.0,Jena ARQ2.11.0,Jena TDB 1.0.0) • Rulesare writtenwith TopbraidComposer Freeversion,and theyare exportedas RDF Turtlefiles. • A smallJava program isimplementedto read RDF models,schema,rulesfrom the TDB store and query data. • AlltheSPARQLqueriesareconfigured using theJena org.apache.jena.sparql.algebra package • To avoid unnecessaryreasoningprocesses,in thistestenvironment onlythe RDFS vocabulary issupported. SPIN+ JenaTDB • Version‘EYE-Winter16.0302.1557’(‘SWI- Prolog 7.2.3 (amd64):Aug 252015, 12:24:59’). • EYE isa semi-backwardreasonerenhanced with Eulerpath detection. • Asour rulesetcurrently containsonly rules using=>, forward reasoningwilltake place. • Each command is executed5 times • Each command includesthefullontology, the fullsetof rulesand theRDFS vocabulary, as wellasone of the 369buildingmodel filesand one of the3 query files. • Notriplestoreisused:triplesare processed directlyfrom theconsideredfiles. EYE • 4.0.2Stardogsemanticgraphdatabase(Java 8, RDF 1.1graph datamodel, OWL2profiles, SPARQL1.1) • OWL reasoner+rule engine. • Support of SWRLrules,backward-chaining reasoning • Reasoningis performedby applyinga query rewritingapproach (SWRLrulesare taken into account during thequery rewritingprocess). • Stardogallowsattaininga DL-expressivity levelof SROIQ(D). • Inthisapproach, SWRLrulesare taken into account during thequery rewritingprocess. Stardog Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 12 July1st, 2016
  • 13. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Queries ◼ We have builta limited listof60 queries,eachofwhichtriggersatleast oneof theavailablerules. ◼ As we focushereon queryexecutionperformance,the consideredqueriesareentirelybasedon the right-handsides ofthe consideredrules. ◼ 3 queries:  Q1 a simple querywith little results,  Q2 a simple querywith manyresults,  and Q3 a complex querythat triggers a considerable numberof rules Query Query Contents Q1 ?obj sbd:hasProperty ?p Q2 ?point sbd:hasCoordinateX ?x . ?point sbd:hasCoordinateY ?y . ?point sbd:hasCoordinateZ ?z Q3 ?d rdf:type sbd:ExternalWall Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 13 July1st, 2016
  • 14. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Test environment ◼ In onecentralserver  Supplied by the Universityof Burgundy,researchgroup CheckSem,  Following specifications: UbuntuOS, Intel Xeon CPU E5-2430at 2.2GHz,6 coresand 16GB of DDR3 RAMmemory ◼ 3 VirtualMachines(VMs) wereset up in thiscentralserver  SPIN VM (Jena TDB), EYE VM(EYE inferenceengine), Stardog VM(Stardog triplestore) ◼ The VMsweremanagedas separatetestenvironments and  Each of these VMs had 2 coresout of 6 allocated  Each containedthe above resources (ontologies, data, rules, queries). Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 14 July1st, 2016
  • 15. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Results ◼ Queriesappliedon 6hand-picked building models ofvaryingsize ◼ In theSPINapproach  For Q1 and Q2, the execution time = backward-chaininginference process + actual query execution time  For Q3, execution time = queryexecution time itself ◼ In the EYEapproach  Networktraffic time is ignored ◼ In the Stardogapproach  Execution time = backward-chaining inference+ actual queryexecution time Query Building Model SPIN (s) EYE (s) Stardog(s) Q1 (simple,little results) BM1 135,36 37,11 13,44 BM2 1,47 0,29 0,17 BM3 24,01 4,87 1,4 BM4 41,28 12,95 3,55 BM5 4,99 1,05 0,33 BM6 0,55 0,16 0,08 Q2 (simple,many results) BM1 46,17 2,10 6,82 BM2 92,03 4,20 15,83 BM3 82,68 4,12 15,28 BM4 19,93 1,04 2,81 BM5 3,69 0,21 1,36 BM6 0,74 0,045 1,00 Q3 (complex) BM1 0,001 0,001 0,07 BM2 0,006 0,003 0,12 BM3 0,002 0,003 0,31 BM4 0,005 0,001 0,20 BM5 0,006 0,013 0,20 BM6 0,001 0,001 0,13 Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 15 July1st, 2016
  • 16. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Query time related to result count For Q1 for each of the considered approaches (green = SPIN; blue = EYE; black = Stardog) For Q2 for each of the considered approaches (green = SPIN; blue = EYE; black = Stardog) Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 16 July1st, 2016
  • 17. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Additional findings • The three considered procedures arequite farapartfrom each other, explaining the considerable performance differences, not only between the procedures, but alsobetween diverse usages within one and the samesystem. • Algorithms and optimization techniques used for each approach aren't entirely used: differences in indexation algorithms, query rewriting techniques and rule handling strategies used. Indexing algorithms, query rewriting techniques, and rule handling strategies • The disadvantage of forward-chaining reasoning process is that millions of triples can bematerialized (EYE, SPIN forQ1 and Q2) • Using backward-chaining reasoning allows avoiding triple materialization, thus saving query execution time (Stardog, SPIN forQ3). Forward- versus backward-chaining • Query Q3 triggers a rule that in turn triggers several other rules in the rule set. If the firstrule does not fire, however, the process stops early. • Query Q2, however, fires relatively long rules. It takes more time to make these matches in all three approaches. Typeof data in the building model • Loading files in memory at query execution time leads to considerable delays. Impact of the triple store • Linear relation: the more results areavailable, the more triples need to bematched, leading to more assertions. Impact of the number of output results Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 17 July1st, 2016
  • 18. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Conclusion and future work ◼ Comparisonof3 differentapproaches  SPIN, EYE andStardog ◼ 3 queriesappliedover 6 differentbuilding models ◼ Futureworkconsistsin  Specifying morethis initial performancebenchmarkwith additional data and rules  Executing additional queries on the rest of the set of building models  Comparingresults ona wider scale: ―forthe individual approaches separately, ―as well as with other approaches not considered here. Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 18 July1st, 2016
  • 19. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Thank you for your attention. Pieter Pauwels, Tarcisio Mendes deFarias, ChiZhang, Ana Roxin, JakobBeetz, Jos De Roo, Christophe Nicolle International Workshop on Semantic Big Data (SBD 2016) in conjunction with the 2016 ACM SIGMOD Conference in San Francisco, USA

Notes de l'éditeur

  1. Depending on the rules that are being triggered, one set of information then has the potential to be made available in a diverse number of forms, bringing an entirely new form of interoperability for an industry that has always relied heavily on the combination of an agreed standard with many intransparent import and export procedures that were implemented using procedural programming languages.