SlideShare une entreprise Scribd logo
1  sur  57
Distributed Incremental 
Model Queries over the Cloud: 
Engineering and Deployment Challenges 
Budapest University of Technology and Economics 
Department of Measurement and Information Systems 
Dániel Varró 
Budapest University of Technology and Economics 
Fault Tolerant Systems Research Group
Outline of the Talk 
Motivation & Background: 
• Validation of design rules 
• Graph pattern matching 
Incremental Model Queries: 
The EMF-IncQuery framework 
• Language - Execution 
Distributed Incremental 
Model Queries (IncQuery-D) 
• Architecture - 
Performance Benchmarks 
• Distributed model load 
• Incremental query evaluation 
 Main Contributors 
o István Ráth (lead) 
o Ákos Horváth 
o Gábor Bergmann 
o Ábel Hegedüs 
o Zoltán Ujhelyi 
o Benedek Izsó 
o Gábor Szárnyas 
o Csaba Debreceni 
o Dénes Harmath 
o József Makai 
o Dániel Stein
SCALABLE 
MODEL DRIVEN ENGINEERING
Scalable MDE: The MONDO Project 
Models and 
Languages 
• Large and 
heterogeneous 
• Construction 
• Visualization 
Queries and 
Transformations 
• Executed over 
large models 
• Incremental 
• Lazy 
• Parallel 
Collaboration 
• Offline (SVN) 
• Online (Gdocs) 
• Many 
collaborators 
• Secure access 
Persistent 
Storage 
• Efficient 
• Secure 
• Interoperability 
Case studies: 
• validate solutions through real case studies 
• guided by industrial advisory board 
Prototype tools: 
• open source software 
• open benchmarks 
Academic Partners: 
• Univ. York (UK) 
Univ. Autónoma Madrid (ES), 
ARMINES (FR), BME (HU) 
Industrial Partners: 
• The Open Group (UK), 
Uninova (PT), Softeam (FR), 
Soft-Maint (FR), IKERLAN (ES)
MOTIVATION FOR 
INCREMENTAL MODEL QUERIES
Motivation: Early validation of design rules 
SystemSignalGroup design rule (from AUTOSAR) 
o A SystemSignal and its group must be in the same IPdu 
o Challenge: find violations quickly in large models 
o New difficulties 
• reverse 
navigation 
• complex 
manual 
solution 
AUTOSAR: 
• standardized SW architecture 
of the automotive industry 
• now supported by modern modeling tools 
Design Rule/Well-formedness constraint: 
• each valid car architecture needs to respect 
• designers are immediately notified if violated 
Challenge: 
• >500 design rules in AUTOSAR tools 
• >1 million elements in AUTOSAR models 
• models constantly evolve by designers
Domain-Specific Modeling Languages 
Abstract 
Meta-model 
Model 
«type»
Validation of Well-formedness Constraints 
Meta-model 
Model 
Query 
pattern switchWOSignal(sw) { 
Switch(sw); 
neg find switchHasSignal(sw); 
} 
pattern switchHasSignal(sw) { 
Switch(sw); 
Signal(sig); 
Signal.mountedTo(sig, sw); 
} 
Modify 
User 
Result
Model sizes in practice 
 Models with 10M+ elements are common: 
o Car industry 
o Avionics 
o Source code analysis 
 Models evolve and change continuously 
Validation can take hours 
Application Model size 
System models 108 
Sensor data 109 
Geospatial models 1012 
Source: Markus Scheidgen, How Big are Models – An Estimation, 2012.
MODEL QUERIES AND 
GRAPH PATTERN MATCHING
What is a model query? 
 For a programmer: 
o A piece of code that searches for parts of the model 
 For the scientist: 
o Query = set of constraints that have to be satisfied by 
(parts of) the (graph) model 
o Result = set of model element tuples that satisfy the 
constraints of the query 
o Match = bind constraint variables to model elements 
 A query engine: Supports 
o the definition&execution 
of model queries 
Query(A,B)  ∧condi(Ai,Bi) 
• all tuples of model elements a,b 
• satisfying the query condition 
• along the match A=a and B=b 
• parameters A,B can be input/ output
Graph Pattern Matching for Queries 
route: Route sp: SwitchPosition 
routeDefinition 
sensor: Sensor switch: Switch 
 Match: 
o m: L  G 
(graph morphism) 
o CSP: 
• Variables: Nodes of L 
• Constraints: Edges of L 
• Domain values: G 
o Complexity: |G|^|L| 
L 
G 
straight 
left 
switchPosition 
switch 
sensor 
All sensors with a switch that belongs to a route must directly be linked to the same route.
Graph Pattern Matching (Local Search) 
switchPosition 
route: Route sp: SwitchPosition 
switch 
routeDefinition 
sensor 
sensor: Sensor switch: Switch 
 Search Plan: 
o Select the first node 
to be matched 
o Define an ordering on 
graph pattern edges 
 Search is restarted from 
scratch each time 
1 
2 
0 
3 
4 
straight 
left
Incremental Graph Pattern Matching 
switchPosition 
route: Route sp: SwitchPosition 
switch 
routeDefinition 
sensor 
sensor: Sensor switch: Switch 
 Main idea: More space to less time 
o Cache matches of patterns 
o Instantly retrieve match (if valid) 
o Update caches upon model changes 
o Notify about relevant changes 
 Approaches: 
o TREAT, LEAPS, RETE, … 
o Tools: VIATRA, GROOVE, MoTE, TCore 
straight 
left 
route sp switch sensor 
r1 sp1 sw1
Batch vs. Live Query Scenarios 
 Batch query 
(pull / request-driven): 
1. Designer selects a query 
2. One/All matches are 
calculated 
3. Rule is applied on one/all 
matches 
4. All Steps 1-3 are redone if 
model changes 
 Query results obtained 
upon designer demand 
 Live query 
(push / event-driven): 
1. Model is loaded 
2. Rule system is loaded 
3. Calculate full match set 
4. Model is changed (rules 
fired or designer updates) 
5. Iterate Steps 3 and 4 until 
rule system is stopped 
 Query results are always 
available for designer
INCREMENTAL MODEL QUERIES: 
THE EMF-INCQUERY PROJECT
EMF-IncQuery: An Open Source Eclipse Project 
• Declarative graph query 
language 
• Transitive closure, 
Negative cond., etc. 
• Compositional, reusable 
Definition 
• Incremental evaluation 
• Cache result set 
• Maintain incrementally 
upon model change 
Execution 
• Derived features, 
• On-the-fly validation 
• View generation, 
Notifications, Soft links, 
Databinding, 
Features 
http://eclipse.org/incquery
The IncQuery (IQ) Graph Query Language 
route: Route sp: SwitchPosition 
routeDefinition 
sensor: Sensor Switch: Switch 
 IQ: declarative query language 
o Attribute constraints 
o Local + global queries 
o Compositionality+Reusabilility 
o Recursion, Negation, 
o Transitive Closure 
o Syntax: DATALOG style 
pattern routeSensor(sensor: Sensor) = { 
TrackElement.sensor(switch,sensor); 
Switch(switch); 
SwitchPosition. switch(sp, switch); 
SwitchPosition(sp); 
Route.switchPosition(route, sp); 
Route(route); 
neg find head(route, sensor); 
} 
pattern head(R, Sen) = { 
Route.routeDefinition(R, Sen); 
} 
ModelQuery(A,B): 
• tuples of model elements A, B 
• satisfying the query condition 
• enumerate 1 / all instances 
• A,B can be input or output 
switchPosition 
switch 
sensor
TOOL DEMO: INCQUERY Development Tools 
Query Explorer 
Pattern Editor 
Queries are applied & 
updates on-the-fly 
• Works with most EMF 
editors out-of-the-box 
• Reveals matches as 
selection
Incremental Query Evaluation by RETE 
 AUTOSAR well-formedness validation rule 
Communication 
channel 
Logical signal Mapping Physical signal 
 Instance model 
Invalid model fragment 
Valid model fragment
Incremental Query Evaluation by RETE 
Read the changes in the 
PFrFRioMlileplltoaahtdghea ietftwhyeineottphrrhkueeetsm rucnhnlootaoddsndeegeltsess 
result set (deltas) 
join 
join 
antijoin 
Result set 
Communication 
channel 
Logical signal Mapping Physical signal
Performance of EMF-INCQUERY 
 Incremental graph queries based on Rete 
 Built for the Eclipse Modeling Framework 
model size 
runtime 
batch 
queries 
incremental 
queries 
Runtime is proportional to 
the size of the modification.
Performance of EMF-INCQUERY 
Storing partial 
memory results 
consumption 
incremental 
queries 
batch 
queries 
memory 
limit 
model size
Selected Applications (EMF-IncQuery) 
• Complex traceability 
• Query driven views 
• Abstract models by 
derived objects 
Toolchain for 
IMA configs 
• Connect to Matlab 
Simulink model 
• Export: Matlab2EMF 
• Change model in EMF 
• Re-import: 
EMF2Matlab 
MATLAB-EMF 
Bridge 
• Live models 
(refreshed 25 
frame/s) 
• Complex event 
processing 
Gesture 
recognition 
• Experiments on open 
source Java projects 
• Local search vs. 
Incremental vs. 
Native Java code 
Detection of 
bad code smells 
• Rules for operations 
• Complex structural 
constraints (as GP) 
• Hints and guidance 
• Potentially infinite 
state space 
Design Space 
Exploration 
• Itemis (developer) 
• Embraer 
• Thales 
• ThyssenKrupp 
• CERN 
Known Users
INCQUERY-D: DISTRIBUTED 
INCREMENTAL MODEL QUERIES
Goals of INCQUERY-D 
 Objectives 
o Distributed incremental pattern matching 
o Adaptation of EMF-INCQUERY’s tooling to graph DBs 
o Executed over cloud infrastructure (COTS hardware) 
 Achieve scalability by avoiding memory bottleneck 
o Sharding separately 
• Data 
• Indexers 
• Query network 
o In memory: 
• Index + Query 
Assumptions 
• All Rete nodes fit on a server node 
• Indexers can be filled efficiently 
• Modification size ≪ model size 
• The application requires the complete result 
set of the query (opposed to just one match)
Dimensions of Scalability 
 Infrastructure 
o Number of machines 
o Available memory / CPU 
o Network performance 
o Number of concurrent users 
 Model 
o Model size 
o Model characteristics 
 Queries 
o Number of queries 
o Query complexity 
Metrics
From EMF-INCQUERY to INCQUERY-D 
EMF-INCQUERY 
Transaction 
Rete net 
Indexer 
layer 
In-memory 
EMF model 
Production network 
• Stores intermediate query results 
• Propagates changes 
Indexing 
In-memory storage
Database 
shard 0 
INCQUERY-D Architecture 
Database 
shard 1 
Server 1 
Database 
shard 2 
Server 2 
Database 
shard 3 
Server 3 
Transaction 
Server 0 
Rete net 
Indexer 
layer 
INCQUERY-D 
Distributed query evaluation network 
Distributed indexer Model access adapter 
Distributed indexing, 
notification 
Distributed persistent 
storage 
Distributed production network 
• Each intermediate node can be allocated 
to a different host 
• Remote internode communication
INCQUERY-D Architecture 
Join 
Database 
shard 1 
Server 1 
Join 
Database 
shard 2 
Server 2 
Triple store (4store), 
Document DB (Mongo), 
RDF over Column family 
(Cumulus) 
Database 
shard 3 
Server 3 
Transaction 
Database 
shard 0 
In-memory 
EMF model 
Server 0 
Antijoin 
Indexer 
layer 
INCQUERY-D 
Akka 
Indexer Indexer Indexer Indexer
Termination Protocol in INCQUERY-D 
Database 
shard 0 
When a production node reached 
an ACK message is sent back Stack added to each update msg 
Database 
shard 1 
Server 1 
• Registers the Rete nodes the 
message passes through 
Database 
shard 2 
Server 2 
User retrieves 
query result 
Database 
shard 3 
Server 3 
Transaction 
Server 0 
INCQUERY-D 
Join 
Join 
Antijoin 
Indexer Indexer Indexer Indexer
IncQuery-D Architectural Layers 
• Gremlin, Cypher 
• SPARQL 
• IQPL (IncQuery) 
High-Level 
Query Lang 
• Distributed Indexers 
(MONDIX) 
• SPARQL 
Low-Level 
Query Lang 
• Cayley 
• Titan 
• 4store 
Distributed 
Graph DB 
• MongoDB 
• Cassandra 
• 4store 
Native 
Storage 
• RDF 
• XMI / Ecore 
• Property Graphs 
Storage 
Format 
• Global queries 
• Complex navigations 
• Efficient element access by indices 
• Local queries 
• Can be transparent (via indexers) 
• Integrates popular graph storages 
• Efficient NoSQL storages 
• Triple stores 
• Standardized data formats 
• Popular interchange formats
Summary: Key Components of IncQuery-D 
Distributed 
Model Storage 
• Adaptable to 
different back-end 
storages 
• Agnostic to 
graph repres. 
• TripleStores 
(RDF), EMF, 
Property 
graph 
Model Access 
Adapter 
• Surrogate key 
to identify 
distibuted 
elements 
• Graph manip. 
API 
• Change 
notifications 
Distributed 
Indexer 
• Type-instance 
indices, etc. 
• Stored on 
multiple 
servers 
• Protects 
exceeding 
memory limits 
Distributed 
Query Evaluator 
• Distributed 
RETE network 
• Distributed 
termination 
protocol 
• Constructed 
and deployed 
by coordinator 
node 
Decouple and separately distribute Storage, Indexer and Query layers
USAGE PHASES
Load 
Model 
(1) Loading a Query 
Update 
Model 
Request 
Result 
Deploy 
RETE 
RETE 
Network 
Allocate 
RETE 
Cloud 
Infra-structure 
Construct 
RETE 
Load 
Query 
Construct RETE 
• From EMF-IncQuery specs 
• Should incorporate 
infrastructure constraints 
Deploy RETE 
• Managed by a 
coordinator node 
• Intelligent sharding of 
RETE nodes
Load 
Model 
(2) Loading a Model 
Update 
Model 
Request 
Result 
Model 
shards 
Deploy 
RETE 
RETE 
Network 
Allocate 
RETE 
Maintain 
Result Set 
Cloud 
Infra-structure 
Construct 
RETE 
Model 
Access 
Adapter 
Load 
Query 
Load model 
• Model traversal 
• Init indexers 
• Network 
communication
Load 
Model 
(3) Updating a Model 
Update 
Model 
Request 
Result 
Model 
shards 
Deploy 
RETE 
RETE 
Network 
Allocate 
RETE 
Maintain 
Result Set 
Cloud 
Infra-structure 
Construct 
RETE 
Model 
Access 
Adapter 
Load 
Query 
Model manipulation 
• Update messages 
• Create / Delete
(4) Requesting Query Result 
Load 
Model 
Update 
Model 
Request 
Result 
Model 
shards 
Deploy 
RETE 
RETE 
Network 
Allocate 
RETE 
Evaluate 
Query 
Maintain 
Result Set 
Cloud 
Infra-structure 
Construct 
RETE 
Model 
Access 
Adapter 
Load 
Query 
Evaluate query 
• Process incoming 
messages 
• Propagate along 
RETE network 
Retrieve results 
• instantly
(5) Monitoring and Reconfiguration 
Load 
Model 
Update 
Model 
Request 
Result 
Model 
shards 
Deploy 
RETE 
RETE 
Network 
Allocate 
RETE 
Evaluate 
Query 
Maintain 
Result Set 
Cloud 
Infra-structure 
Monitor & Manage 
Construct 
RETE 
Model 
Access 
Adapter 
Load 
Query 
Visualized on a 
web-based dashboard 
OS metrics JVM metrics Akka metrics Rete metrics
DEPLOYMENT PROCESS FOR 
DISTRIBUTED RETE
RETE Deployment Process 
Query 
Language 
Query 
Predicates 
RETE 
Structure 
route: Route sp: SwitchPosition 
routeDefinition 
sensor: Sensor Switch: Switch 
Platform 
Description 
Allocation / 
Mapping 
Deployment 
Descriptor 
pattern routeSensor(sensor: Sensor) = { 
TrackElement.sensor(switch,sensor); 
Switch(switch); 
SwitchPosition. switch(sp, switch); 
SwitchPosition(sp); 
Route.switchPosition(route, sp); 
Route(route); 
neg find head(route, sensor); 
} 
pattern head(R, Sen) = { 
Route.routeDefinition(R, Sen); 
} 
switchPosition 
switch 
sensor
Tooling: RDF Pattern Language 
import "http://www.semanticweb.org/ontologies/2011/1/TrainRequirementOntology.owl" 
pattern posLength(Segment, SegmentLength) { 
Segment(Segment); 
Segment.Segment_length(Segment, SegmentLength); 
check(SegmentLength <= 0); 
} EMF-IncQuery syntax 
Vocabulary <railway.rdf> 
base <http://www.semanticweb.org/ontologies/2011/1/TrainRequirementOntology.owl#> 
pattern posLength(Segment, SegmentLength) { 
Segment(Segment); 
Segment_length(Segment, SegmentLength); 
check("SegmentLength <= 0"); 
} 
segment: Segment 
segment.length 0 
RDF-IncQuery syntax 
Xbase (compiles to Java) 
Javascript
RETE Deployment Process 
 Construct language-independent 
constraints 
 Resolution of 
o syntactic sugar 
o type information 
Query 
Language 
Query 
Predicates 
RETE 
Structure 
Platform 
Description 
Allocation / 
Mapping 
Deployment 
Descriptor 
Variables route sp switch 
Parameter sensor 
Edge: SwitchPosition.switch 
Edge: TrackElement.sensor 
Edge: Route.switchPosition 
Negation: head 
Constraints
RETE Deployment Process 
 Construct RETE structure 
(platform independently) 
 Optimizations: 
o Model statistics 
o Expected usage profile 
Query 
Language 
Query 
Predicates 
RETE 
Structure 
Platform 
Description 
Allocation / 
Mapping 
Deployment 
Descriptor 
join 
join 
join
RETE Deployment Process 
 Architecture model 
(Cloud infrastructure) 
o Virtual Machines 
• Memory limits 
• CPU speed 
• Storage capacity 
o Communication Channels 
• Bandwidth 
 Specified by a textual DSL 
(Xtext) 
Query 
Language 
Query 
Predicates 
RETE 
Structure 
1 2 
3 4 
Platform 
Description 
Allocation / 
Mapping 
Deployment 
Descriptor
RETE Deployment Process 
Machine Allocated Nodes 
1 In1, In2, Join2 
2 In3 
3 In4 
4 Join1, Join3 
Query 
Language 
Query 
Predicates 
RETE 
Structure 
1 2 
3 4 
Platform 
Description 
Allocation / 
Mapping 
Deployment 
Descriptor 
Join1 
Join3 
Join2 
In1 In2 In3 In4
RETE Deployment Process 
 Configuration scripts for 
o Deployment 
o Communication 
middleware 
 Derived by automated 
code generation 
o Using Eclipse technology: 
EMF-IncQuery + Xtend 
Query 
Language 
Query 
Predicates 
RETE 
Structure 
Platform 
Description 
Allocation / 
Mapping 
Deployment 
Descriptor
DISTRIBUTED 
PERFORMANCE BENCHMARKS
The Train Benchmark 
 Model validation workload: 
o User edits the model 
o Instant validation of 
well-formedness constraints 
o Model is repaired accordingly 
 Scenario: 
o Load 
o Check 
o Edit 
o Re-Check 
 Models: 
o Randomly generated 
o Close to real world instances 
o Following different metrics 
o Customized distributions 
o Low number of violations 
 Queries: 
o Two simple queries 
(<2 objects, attributes) 
o Two complex queries 
(4-7 joins, negation, etc.) 
o Validated match sets 
Batch validation Incremental validation 
Instance 
model 
Read Check ! Edit  ReCheck  
100x
Evaluation of distributed scalability 
 Extensions to previous work (single workstation) 
o Generation of large instance models 
o Distributed, parallel loading of models 
o Distributed transformation and validation 
Benchmark Distributed benchmark 
Model size 1K – 13M 1K – 88M 
Load method Batch Distributed, parallel 
Transformation and validation Single workstation Multiple servers
IncremBenatcahl sgcreanpahr isoc e–nIanrciQouery-D 
 Load and first validation: load the graph to the databases 
and execute initialize the the query 
Rete net and retrieve the results 
 Transformation: query the graph and delete some 
elements 
incrementally query the graph and 
delete some elements, propagate the changes 
 Revalidation: execute retrieve the query 
results from the Rete net 
Load and first 
validation 
GraphML Transformation Revalidation 
DB shards Result set 
Rete net 
DB shards Result set 
Rete net
Benchmark environment 
 Private cloud 
 Different DBMSs 
 Query 
o The DBMS’s own query language 
o IncQuery-D 
SPARQL Gremlin
4096 
1024 
256 
64 
16 
4 
1 
Runtime [s] 
Load and first validation 
55M model: approx. 15 minutes 
Rete network’s 
initialization 
overhead pays off 
Model size [million elements] 
4store IncQuery-D Titan IncQuery-D 4store
256 
128 
64 
32 
16 
8 
4 
2 
1 
Runtime [s] 
Model modification 
1. Elementary model query 
2. Model modification 
2 orders of 
magnitude 
– Query from the Rete network’s indexer 
– Propagation of modifications is fast 
Model size [million elements] 
4store IncQuery-D Titan IncQuery-D 4store
128.00 
32.00 
8.00 
2.00 
0.50 
0.13 
0.03 
0.01 
Runtime [s] 
Revalidation 
Different characteristics 
Sub-second response 
time for models with 
88M elements 
Model size [million elements] 
memory 
limit 
4store IncQuery-D Titan IncQuery-D 4store
Benchmarking Conclusions 
 Memory consumption 
o Single workstation: 13M model, 4 GB 
o Cloud of four servers: 55M model, <4×8 GB 
 Runtime 
o Same order of magnitude and similar characteristics to 
the single workstation tool 
INCQUERY-D is scalable and significantly more efficient for query 
evaluation than the native query engines in 4store, Titan and Neo4j
Conclusions

Contenu connexe

Similaire à IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineering and Deployment Challenges

Planning and Control Algorithms Model-Based Approach (State-Space)
Planning and Control Algorithms Model-Based Approach (State-Space)Planning and Control Algorithms Model-Based Approach (State-Space)
Planning and Control Algorithms Model-Based Approach (State-Space)
M Reza Rahmati
 
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
inside-BigData.com
 

Similaire à IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineering and Deployment Challenges (20)

SERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the Cloud
 
Incremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical SystemsIncremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical Systems
 
Incremental Model Queries for Model-Dirven Software Engineering
Incremental Model Queries for Model-Dirven Software EngineeringIncremental Model Queries for Model-Dirven Software Engineering
Incremental Model Queries for Model-Dirven Software Engineering
 
Hardware-Software allocation specification of IMA systems for early simulation
Hardware-Software allocation specification of IMA systems for early simulationHardware-Software allocation specification of IMA systems for early simulation
Hardware-Software allocation specification of IMA systems for early simulation
 
IncQuery Labs Models 2020 MIP Talk
IncQuery Labs Models 2020 MIP TalkIncQuery Labs Models 2020 MIP Talk
IncQuery Labs Models 2020 MIP Talk
 
IncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudIncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the Cloud
 
Modelon Modelica executable requirements Ansys Conference 2016
Modelon Modelica executable requirements Ansys Conference 2016Modelon Modelica executable requirements Ansys Conference 2016
Modelon Modelica executable requirements Ansys Conference 2016
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Intro to LV in 3 Hours for Control and Sim 8_5.pptx
Intro to LV in 3 Hours for Control and Sim 8_5.pptxIntro to LV in 3 Hours for Control and Sim 8_5.pptx
Intro to LV in 3 Hours for Control and Sim 8_5.pptx
 
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery LabsIncquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
 
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceCOMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
 
Decreasing your Coffe Consumption by Incremental Code regeneration
Decreasing your Coffe Consumption by Incremental Code regenerationDecreasing your Coffe Consumption by Incremental Code regeneration
Decreasing your Coffe Consumption by Incremental Code regeneration
 
Towards Scalable Validation of Low-Code System Models: Mapping EVL to VIATRA ...
Towards Scalable Validation of Low-Code System Models: Mapping EVL to VIATRA ...Towards Scalable Validation of Low-Code System Models: Mapping EVL to VIATRA ...
Towards Scalable Validation of Low-Code System Models: Mapping EVL to VIATRA ...
 
Model visualization made easy: Incremental query-driven views in modeling tools
Model visualization made easy: Incremental query-driven views in modeling toolsModel visualization made easy: Incremental query-driven views in modeling tools
Model visualization made easy: Incremental query-driven views in modeling tools
 
Planning and Control Algorithms Model-Based Approach (State-Space)
Planning and Control Algorithms Model-Based Approach (State-Space)Planning and Control Algorithms Model-Based Approach (State-Space)
Planning and Control Algorithms Model-Based Approach (State-Space)
 
Eclipse VIATRA Overview 2017
Eclipse VIATRA Overview 2017Eclipse VIATRA Overview 2017
Eclipse VIATRA Overview 2017
 
A High-Level Programming Approach for using FPGAs in HPC using Functional Des...
A High-Level Programming Approach for using FPGAs in HPC using Functional Des...A High-Level Programming Approach for using FPGAs in HPC using Functional Des...
A High-Level Programming Approach for using FPGAs in HPC using Functional Des...
 
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
 
Testing Model Transformations
Testing Model TransformationsTesting Model Transformations
Testing Model Transformations
 

Dernier

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 

Dernier (20)

VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 

IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineering and Deployment Challenges

  • 1. Distributed Incremental Model Queries over the Cloud: Engineering and Deployment Challenges Budapest University of Technology and Economics Department of Measurement and Information Systems Dániel Varró Budapest University of Technology and Economics Fault Tolerant Systems Research Group
  • 2. Outline of the Talk Motivation & Background: • Validation of design rules • Graph pattern matching Incremental Model Queries: The EMF-IncQuery framework • Language - Execution Distributed Incremental Model Queries (IncQuery-D) • Architecture - Performance Benchmarks • Distributed model load • Incremental query evaluation  Main Contributors o István Ráth (lead) o Ákos Horváth o Gábor Bergmann o Ábel Hegedüs o Zoltán Ujhelyi o Benedek Izsó o Gábor Szárnyas o Csaba Debreceni o Dénes Harmath o József Makai o Dániel Stein
  • 3. SCALABLE MODEL DRIVEN ENGINEERING
  • 4. Scalable MDE: The MONDO Project Models and Languages • Large and heterogeneous • Construction • Visualization Queries and Transformations • Executed over large models • Incremental • Lazy • Parallel Collaboration • Offline (SVN) • Online (Gdocs) • Many collaborators • Secure access Persistent Storage • Efficient • Secure • Interoperability Case studies: • validate solutions through real case studies • guided by industrial advisory board Prototype tools: • open source software • open benchmarks Academic Partners: • Univ. York (UK) Univ. Autónoma Madrid (ES), ARMINES (FR), BME (HU) Industrial Partners: • The Open Group (UK), Uninova (PT), Softeam (FR), Soft-Maint (FR), IKERLAN (ES)
  • 6. Motivation: Early validation of design rules SystemSignalGroup design rule (from AUTOSAR) o A SystemSignal and its group must be in the same IPdu o Challenge: find violations quickly in large models o New difficulties • reverse navigation • complex manual solution AUTOSAR: • standardized SW architecture of the automotive industry • now supported by modern modeling tools Design Rule/Well-formedness constraint: • each valid car architecture needs to respect • designers are immediately notified if violated Challenge: • >500 design rules in AUTOSAR tools • >1 million elements in AUTOSAR models • models constantly evolve by designers
  • 7. Domain-Specific Modeling Languages Abstract Meta-model Model «type»
  • 8. Validation of Well-formedness Constraints Meta-model Model Query pattern switchWOSignal(sw) { Switch(sw); neg find switchHasSignal(sw); } pattern switchHasSignal(sw) { Switch(sw); Signal(sig); Signal.mountedTo(sig, sw); } Modify User Result
  • 9. Model sizes in practice  Models with 10M+ elements are common: o Car industry o Avionics o Source code analysis  Models evolve and change continuously Validation can take hours Application Model size System models 108 Sensor data 109 Geospatial models 1012 Source: Markus Scheidgen, How Big are Models – An Estimation, 2012.
  • 10. MODEL QUERIES AND GRAPH PATTERN MATCHING
  • 11. What is a model query?  For a programmer: o A piece of code that searches for parts of the model  For the scientist: o Query = set of constraints that have to be satisfied by (parts of) the (graph) model o Result = set of model element tuples that satisfy the constraints of the query o Match = bind constraint variables to model elements  A query engine: Supports o the definition&execution of model queries Query(A,B)  ∧condi(Ai,Bi) • all tuples of model elements a,b • satisfying the query condition • along the match A=a and B=b • parameters A,B can be input/ output
  • 12. Graph Pattern Matching for Queries route: Route sp: SwitchPosition routeDefinition sensor: Sensor switch: Switch  Match: o m: L  G (graph morphism) o CSP: • Variables: Nodes of L • Constraints: Edges of L • Domain values: G o Complexity: |G|^|L| L G straight left switchPosition switch sensor All sensors with a switch that belongs to a route must directly be linked to the same route.
  • 13. Graph Pattern Matching (Local Search) switchPosition route: Route sp: SwitchPosition switch routeDefinition sensor sensor: Sensor switch: Switch  Search Plan: o Select the first node to be matched o Define an ordering on graph pattern edges  Search is restarted from scratch each time 1 2 0 3 4 straight left
  • 14. Incremental Graph Pattern Matching switchPosition route: Route sp: SwitchPosition switch routeDefinition sensor sensor: Sensor switch: Switch  Main idea: More space to less time o Cache matches of patterns o Instantly retrieve match (if valid) o Update caches upon model changes o Notify about relevant changes  Approaches: o TREAT, LEAPS, RETE, … o Tools: VIATRA, GROOVE, MoTE, TCore straight left route sp switch sensor r1 sp1 sw1
  • 15. Batch vs. Live Query Scenarios  Batch query (pull / request-driven): 1. Designer selects a query 2. One/All matches are calculated 3. Rule is applied on one/all matches 4. All Steps 1-3 are redone if model changes  Query results obtained upon designer demand  Live query (push / event-driven): 1. Model is loaded 2. Rule system is loaded 3. Calculate full match set 4. Model is changed (rules fired or designer updates) 5. Iterate Steps 3 and 4 until rule system is stopped  Query results are always available for designer
  • 16. INCREMENTAL MODEL QUERIES: THE EMF-INCQUERY PROJECT
  • 17. EMF-IncQuery: An Open Source Eclipse Project • Declarative graph query language • Transitive closure, Negative cond., etc. • Compositional, reusable Definition • Incremental evaluation • Cache result set • Maintain incrementally upon model change Execution • Derived features, • On-the-fly validation • View generation, Notifications, Soft links, Databinding, Features http://eclipse.org/incquery
  • 18. The IncQuery (IQ) Graph Query Language route: Route sp: SwitchPosition routeDefinition sensor: Sensor Switch: Switch  IQ: declarative query language o Attribute constraints o Local + global queries o Compositionality+Reusabilility o Recursion, Negation, o Transitive Closure o Syntax: DATALOG style pattern routeSensor(sensor: Sensor) = { TrackElement.sensor(switch,sensor); Switch(switch); SwitchPosition. switch(sp, switch); SwitchPosition(sp); Route.switchPosition(route, sp); Route(route); neg find head(route, sensor); } pattern head(R, Sen) = { Route.routeDefinition(R, Sen); } ModelQuery(A,B): • tuples of model elements A, B • satisfying the query condition • enumerate 1 / all instances • A,B can be input or output switchPosition switch sensor
  • 19. TOOL DEMO: INCQUERY Development Tools Query Explorer Pattern Editor Queries are applied & updates on-the-fly • Works with most EMF editors out-of-the-box • Reveals matches as selection
  • 20. Incremental Query Evaluation by RETE  AUTOSAR well-formedness validation rule Communication channel Logical signal Mapping Physical signal  Instance model Invalid model fragment Valid model fragment
  • 21. Incremental Query Evaluation by RETE Read the changes in the PFrFRioMlileplltoaahtdghea ietftwhyeineottphrrhkueeetsm rucnhnlootaoddsndeegeltsess result set (deltas) join join antijoin Result set Communication channel Logical signal Mapping Physical signal
  • 22. Performance of EMF-INCQUERY  Incremental graph queries based on Rete  Built for the Eclipse Modeling Framework model size runtime batch queries incremental queries Runtime is proportional to the size of the modification.
  • 23. Performance of EMF-INCQUERY Storing partial memory results consumption incremental queries batch queries memory limit model size
  • 24. Selected Applications (EMF-IncQuery) • Complex traceability • Query driven views • Abstract models by derived objects Toolchain for IMA configs • Connect to Matlab Simulink model • Export: Matlab2EMF • Change model in EMF • Re-import: EMF2Matlab MATLAB-EMF Bridge • Live models (refreshed 25 frame/s) • Complex event processing Gesture recognition • Experiments on open source Java projects • Local search vs. Incremental vs. Native Java code Detection of bad code smells • Rules for operations • Complex structural constraints (as GP) • Hints and guidance • Potentially infinite state space Design Space Exploration • Itemis (developer) • Embraer • Thales • ThyssenKrupp • CERN Known Users
  • 26. Goals of INCQUERY-D  Objectives o Distributed incremental pattern matching o Adaptation of EMF-INCQUERY’s tooling to graph DBs o Executed over cloud infrastructure (COTS hardware)  Achieve scalability by avoiding memory bottleneck o Sharding separately • Data • Indexers • Query network o In memory: • Index + Query Assumptions • All Rete nodes fit on a server node • Indexers can be filled efficiently • Modification size ≪ model size • The application requires the complete result set of the query (opposed to just one match)
  • 27. Dimensions of Scalability  Infrastructure o Number of machines o Available memory / CPU o Network performance o Number of concurrent users  Model o Model size o Model characteristics  Queries o Number of queries o Query complexity Metrics
  • 28. From EMF-INCQUERY to INCQUERY-D EMF-INCQUERY Transaction Rete net Indexer layer In-memory EMF model Production network • Stores intermediate query results • Propagates changes Indexing In-memory storage
  • 29. Database shard 0 INCQUERY-D Architecture Database shard 1 Server 1 Database shard 2 Server 2 Database shard 3 Server 3 Transaction Server 0 Rete net Indexer layer INCQUERY-D Distributed query evaluation network Distributed indexer Model access adapter Distributed indexing, notification Distributed persistent storage Distributed production network • Each intermediate node can be allocated to a different host • Remote internode communication
  • 30. INCQUERY-D Architecture Join Database shard 1 Server 1 Join Database shard 2 Server 2 Triple store (4store), Document DB (Mongo), RDF over Column family (Cumulus) Database shard 3 Server 3 Transaction Database shard 0 In-memory EMF model Server 0 Antijoin Indexer layer INCQUERY-D Akka Indexer Indexer Indexer Indexer
  • 31. Termination Protocol in INCQUERY-D Database shard 0 When a production node reached an ACK message is sent back Stack added to each update msg Database shard 1 Server 1 • Registers the Rete nodes the message passes through Database shard 2 Server 2 User retrieves query result Database shard 3 Server 3 Transaction Server 0 INCQUERY-D Join Join Antijoin Indexer Indexer Indexer Indexer
  • 32. IncQuery-D Architectural Layers • Gremlin, Cypher • SPARQL • IQPL (IncQuery) High-Level Query Lang • Distributed Indexers (MONDIX) • SPARQL Low-Level Query Lang • Cayley • Titan • 4store Distributed Graph DB • MongoDB • Cassandra • 4store Native Storage • RDF • XMI / Ecore • Property Graphs Storage Format • Global queries • Complex navigations • Efficient element access by indices • Local queries • Can be transparent (via indexers) • Integrates popular graph storages • Efficient NoSQL storages • Triple stores • Standardized data formats • Popular interchange formats
  • 33. Summary: Key Components of IncQuery-D Distributed Model Storage • Adaptable to different back-end storages • Agnostic to graph repres. • TripleStores (RDF), EMF, Property graph Model Access Adapter • Surrogate key to identify distibuted elements • Graph manip. API • Change notifications Distributed Indexer • Type-instance indices, etc. • Stored on multiple servers • Protects exceeding memory limits Distributed Query Evaluator • Distributed RETE network • Distributed termination protocol • Constructed and deployed by coordinator node Decouple and separately distribute Storage, Indexer and Query layers
  • 35. Load Model (1) Loading a Query Update Model Request Result Deploy RETE RETE Network Allocate RETE Cloud Infra-structure Construct RETE Load Query Construct RETE • From EMF-IncQuery specs • Should incorporate infrastructure constraints Deploy RETE • Managed by a coordinator node • Intelligent sharding of RETE nodes
  • 36. Load Model (2) Loading a Model Update Model Request Result Model shards Deploy RETE RETE Network Allocate RETE Maintain Result Set Cloud Infra-structure Construct RETE Model Access Adapter Load Query Load model • Model traversal • Init indexers • Network communication
  • 37. Load Model (3) Updating a Model Update Model Request Result Model shards Deploy RETE RETE Network Allocate RETE Maintain Result Set Cloud Infra-structure Construct RETE Model Access Adapter Load Query Model manipulation • Update messages • Create / Delete
  • 38. (4) Requesting Query Result Load Model Update Model Request Result Model shards Deploy RETE RETE Network Allocate RETE Evaluate Query Maintain Result Set Cloud Infra-structure Construct RETE Model Access Adapter Load Query Evaluate query • Process incoming messages • Propagate along RETE network Retrieve results • instantly
  • 39. (5) Monitoring and Reconfiguration Load Model Update Model Request Result Model shards Deploy RETE RETE Network Allocate RETE Evaluate Query Maintain Result Set Cloud Infra-structure Monitor & Manage Construct RETE Model Access Adapter Load Query Visualized on a web-based dashboard OS metrics JVM metrics Akka metrics Rete metrics
  • 40. DEPLOYMENT PROCESS FOR DISTRIBUTED RETE
  • 41. RETE Deployment Process Query Language Query Predicates RETE Structure route: Route sp: SwitchPosition routeDefinition sensor: Sensor Switch: Switch Platform Description Allocation / Mapping Deployment Descriptor pattern routeSensor(sensor: Sensor) = { TrackElement.sensor(switch,sensor); Switch(switch); SwitchPosition. switch(sp, switch); SwitchPosition(sp); Route.switchPosition(route, sp); Route(route); neg find head(route, sensor); } pattern head(R, Sen) = { Route.routeDefinition(R, Sen); } switchPosition switch sensor
  • 42. Tooling: RDF Pattern Language import "http://www.semanticweb.org/ontologies/2011/1/TrainRequirementOntology.owl" pattern posLength(Segment, SegmentLength) { Segment(Segment); Segment.Segment_length(Segment, SegmentLength); check(SegmentLength <= 0); } EMF-IncQuery syntax Vocabulary <railway.rdf> base <http://www.semanticweb.org/ontologies/2011/1/TrainRequirementOntology.owl#> pattern posLength(Segment, SegmentLength) { Segment(Segment); Segment_length(Segment, SegmentLength); check("SegmentLength <= 0"); } segment: Segment segment.length 0 RDF-IncQuery syntax Xbase (compiles to Java) Javascript
  • 43. RETE Deployment Process  Construct language-independent constraints  Resolution of o syntactic sugar o type information Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor Variables route sp switch Parameter sensor Edge: SwitchPosition.switch Edge: TrackElement.sensor Edge: Route.switchPosition Negation: head Constraints
  • 44. RETE Deployment Process  Construct RETE structure (platform independently)  Optimizations: o Model statistics o Expected usage profile Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor join join join
  • 45. RETE Deployment Process  Architecture model (Cloud infrastructure) o Virtual Machines • Memory limits • CPU speed • Storage capacity o Communication Channels • Bandwidth  Specified by a textual DSL (Xtext) Query Language Query Predicates RETE Structure 1 2 3 4 Platform Description Allocation / Mapping Deployment Descriptor
  • 46. RETE Deployment Process Machine Allocated Nodes 1 In1, In2, Join2 2 In3 3 In4 4 Join1, Join3 Query Language Query Predicates RETE Structure 1 2 3 4 Platform Description Allocation / Mapping Deployment Descriptor Join1 Join3 Join2 In1 In2 In3 In4
  • 47. RETE Deployment Process  Configuration scripts for o Deployment o Communication middleware  Derived by automated code generation o Using Eclipse technology: EMF-IncQuery + Xtend Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor
  • 49. The Train Benchmark  Model validation workload: o User edits the model o Instant validation of well-formedness constraints o Model is repaired accordingly  Scenario: o Load o Check o Edit o Re-Check  Models: o Randomly generated o Close to real world instances o Following different metrics o Customized distributions o Low number of violations  Queries: o Two simple queries (<2 objects, attributes) o Two complex queries (4-7 joins, negation, etc.) o Validated match sets Batch validation Incremental validation Instance model Read Check ! Edit  ReCheck  100x
  • 50. Evaluation of distributed scalability  Extensions to previous work (single workstation) o Generation of large instance models o Distributed, parallel loading of models o Distributed transformation and validation Benchmark Distributed benchmark Model size 1K – 13M 1K – 88M Load method Batch Distributed, parallel Transformation and validation Single workstation Multiple servers
  • 51. IncremBenatcahl sgcreanpahr isoc e–nIanrciQouery-D  Load and first validation: load the graph to the databases and execute initialize the the query Rete net and retrieve the results  Transformation: query the graph and delete some elements incrementally query the graph and delete some elements, propagate the changes  Revalidation: execute retrieve the query results from the Rete net Load and first validation GraphML Transformation Revalidation DB shards Result set Rete net DB shards Result set Rete net
  • 52. Benchmark environment  Private cloud  Different DBMSs  Query o The DBMS’s own query language o IncQuery-D SPARQL Gremlin
  • 53. 4096 1024 256 64 16 4 1 Runtime [s] Load and first validation 55M model: approx. 15 minutes Rete network’s initialization overhead pays off Model size [million elements] 4store IncQuery-D Titan IncQuery-D 4store
  • 54. 256 128 64 32 16 8 4 2 1 Runtime [s] Model modification 1. Elementary model query 2. Model modification 2 orders of magnitude – Query from the Rete network’s indexer – Propagation of modifications is fast Model size [million elements] 4store IncQuery-D Titan IncQuery-D 4store
  • 55. 128.00 32.00 8.00 2.00 0.50 0.13 0.03 0.01 Runtime [s] Revalidation Different characteristics Sub-second response time for models with 88M elements Model size [million elements] memory limit 4store IncQuery-D Titan IncQuery-D 4store
  • 56. Benchmarking Conclusions  Memory consumption o Single workstation: 13M model, 4 GB o Cloud of four servers: 55M model, <4×8 GB  Runtime o Same order of magnitude and similar characteristics to the single workstation tool INCQUERY-D is scalable and significantly more efficient for query evaluation than the native query engines in 4store, Titan and Neo4j

Notes de l'éditeur

  1. load+init Approx. 20% overhead