In model-driven software engineering (MDE), model queries are core technologies of many tool and transformation-specific challenges such as design rule validation, model synchronization, view maintenance, simulation and many more. As software models are rapidly increasing in size and complexity, traditional MDE tools frequently face scalability issues that decrease productivity of engineers and increase development costs. Incremental graph queries offer a graph pattern based language for capturing queries. Furthermore, the result set of a query is cached and incrementally maintained upon model changes to provide instantaneous query response time. In this talk, first a brief overview is given on the EMF-IncQuery framework (which is an official Eclipse subproject). Then we discuss how to incorporate incremental queries over a distributed cloud infrastructure (to scale up from a single-node tool to a cluster of nodes) deployed over popular database back-ends (such as Cassandra. 4store, Neo4J, etc). We present our first benchmarking experiments with IncQuery-D to highlight that distributed incremental model queries can perform significantly better than the native query technologies of the underlying database back-end, especially, for complex queries.
Right Money Management App For Your Financial Goals
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineering and Deployment Challenges
1. Distributed Incremental
Model Queries over the Cloud:
Engineering and Deployment Challenges
Budapest University of Technology and Economics
Department of Measurement and Information Systems
Dániel Varró
Budapest University of Technology and Economics
Fault Tolerant Systems Research Group
2. Outline of the Talk
Motivation & Background:
• Validation of design rules
• Graph pattern matching
Incremental Model Queries:
The EMF-IncQuery framework
• Language - Execution
Distributed Incremental
Model Queries (IncQuery-D)
• Architecture -
Performance Benchmarks
• Distributed model load
• Incremental query evaluation
Main Contributors
o István Ráth (lead)
o Ákos Horváth
o Gábor Bergmann
o Ábel Hegedüs
o Zoltán Ujhelyi
o Benedek Izsó
o Gábor Szárnyas
o Csaba Debreceni
o Dénes Harmath
o József Makai
o Dániel Stein
4. Scalable MDE: The MONDO Project
Models and
Languages
• Large and
heterogeneous
• Construction
• Visualization
Queries and
Transformations
• Executed over
large models
• Incremental
• Lazy
• Parallel
Collaboration
• Offline (SVN)
• Online (Gdocs)
• Many
collaborators
• Secure access
Persistent
Storage
• Efficient
• Secure
• Interoperability
Case studies:
• validate solutions through real case studies
• guided by industrial advisory board
Prototype tools:
• open source software
• open benchmarks
Academic Partners:
• Univ. York (UK)
Univ. Autónoma Madrid (ES),
ARMINES (FR), BME (HU)
Industrial Partners:
• The Open Group (UK),
Uninova (PT), Softeam (FR),
Soft-Maint (FR), IKERLAN (ES)
6. Motivation: Early validation of design rules
SystemSignalGroup design rule (from AUTOSAR)
o A SystemSignal and its group must be in the same IPdu
o Challenge: find violations quickly in large models
o New difficulties
• reverse
navigation
• complex
manual
solution
AUTOSAR:
• standardized SW architecture
of the automotive industry
• now supported by modern modeling tools
Design Rule/Well-formedness constraint:
• each valid car architecture needs to respect
• designers are immediately notified if violated
Challenge:
• >500 design rules in AUTOSAR tools
• >1 million elements in AUTOSAR models
• models constantly evolve by designers
8. Validation of Well-formedness Constraints
Meta-model
Model
Query
pattern switchWOSignal(sw) {
Switch(sw);
neg find switchHasSignal(sw);
}
pattern switchHasSignal(sw) {
Switch(sw);
Signal(sig);
Signal.mountedTo(sig, sw);
}
Modify
User
Result
9. Model sizes in practice
Models with 10M+ elements are common:
o Car industry
o Avionics
o Source code analysis
Models evolve and change continuously
Validation can take hours
Application Model size
System models 108
Sensor data 109
Geospatial models 1012
Source: Markus Scheidgen, How Big are Models – An Estimation, 2012.
11. What is a model query?
For a programmer:
o A piece of code that searches for parts of the model
For the scientist:
o Query = set of constraints that have to be satisfied by
(parts of) the (graph) model
o Result = set of model element tuples that satisfy the
constraints of the query
o Match = bind constraint variables to model elements
A query engine: Supports
o the definition&execution
of model queries
Query(A,B) ∧condi(Ai,Bi)
• all tuples of model elements a,b
• satisfying the query condition
• along the match A=a and B=b
• parameters A,B can be input/ output
12. Graph Pattern Matching for Queries
route: Route sp: SwitchPosition
routeDefinition
sensor: Sensor switch: Switch
Match:
o m: L G
(graph morphism)
o CSP:
• Variables: Nodes of L
• Constraints: Edges of L
• Domain values: G
o Complexity: |G|^|L|
L
G
straight
left
switchPosition
switch
sensor
All sensors with a switch that belongs to a route must directly be linked to the same route.
13. Graph Pattern Matching (Local Search)
switchPosition
route: Route sp: SwitchPosition
switch
routeDefinition
sensor
sensor: Sensor switch: Switch
Search Plan:
o Select the first node
to be matched
o Define an ordering on
graph pattern edges
Search is restarted from
scratch each time
1
2
0
3
4
straight
left
14. Incremental Graph Pattern Matching
switchPosition
route: Route sp: SwitchPosition
switch
routeDefinition
sensor
sensor: Sensor switch: Switch
Main idea: More space to less time
o Cache matches of patterns
o Instantly retrieve match (if valid)
o Update caches upon model changes
o Notify about relevant changes
Approaches:
o TREAT, LEAPS, RETE, …
o Tools: VIATRA, GROOVE, MoTE, TCore
straight
left
route sp switch sensor
r1 sp1 sw1
15. Batch vs. Live Query Scenarios
Batch query
(pull / request-driven):
1. Designer selects a query
2. One/All matches are
calculated
3. Rule is applied on one/all
matches
4. All Steps 1-3 are redone if
model changes
Query results obtained
upon designer demand
Live query
(push / event-driven):
1. Model is loaded
2. Rule system is loaded
3. Calculate full match set
4. Model is changed (rules
fired or designer updates)
5. Iterate Steps 3 and 4 until
rule system is stopped
Query results are always
available for designer
17. EMF-IncQuery: An Open Source Eclipse Project
• Declarative graph query
language
• Transitive closure,
Negative cond., etc.
• Compositional, reusable
Definition
• Incremental evaluation
• Cache result set
• Maintain incrementally
upon model change
Execution
• Derived features,
• On-the-fly validation
• View generation,
Notifications, Soft links,
Databinding,
Features
http://eclipse.org/incquery
18. The IncQuery (IQ) Graph Query Language
route: Route sp: SwitchPosition
routeDefinition
sensor: Sensor Switch: Switch
IQ: declarative query language
o Attribute constraints
o Local + global queries
o Compositionality+Reusabilility
o Recursion, Negation,
o Transitive Closure
o Syntax: DATALOG style
pattern routeSensor(sensor: Sensor) = {
TrackElement.sensor(switch,sensor);
Switch(switch);
SwitchPosition. switch(sp, switch);
SwitchPosition(sp);
Route.switchPosition(route, sp);
Route(route);
neg find head(route, sensor);
}
pattern head(R, Sen) = {
Route.routeDefinition(R, Sen);
}
ModelQuery(A,B):
• tuples of model elements A, B
• satisfying the query condition
• enumerate 1 / all instances
• A,B can be input or output
switchPosition
switch
sensor
19. TOOL DEMO: INCQUERY Development Tools
Query Explorer
Pattern Editor
Queries are applied &
updates on-the-fly
• Works with most EMF
editors out-of-the-box
• Reveals matches as
selection
20. Incremental Query Evaluation by RETE
AUTOSAR well-formedness validation rule
Communication
channel
Logical signal Mapping Physical signal
Instance model
Invalid model fragment
Valid model fragment
21. Incremental Query Evaluation by RETE
Read the changes in the
PFrFRioMlileplltoaahtdghea ietftwhyeineottphrrhkueeetsm rucnhnlootaoddsndeegeltsess
result set (deltas)
join
join
antijoin
Result set
Communication
channel
Logical signal Mapping Physical signal
22. Performance of EMF-INCQUERY
Incremental graph queries based on Rete
Built for the Eclipse Modeling Framework
model size
runtime
batch
queries
incremental
queries
Runtime is proportional to
the size of the modification.
23. Performance of EMF-INCQUERY
Storing partial
memory results
consumption
incremental
queries
batch
queries
memory
limit
model size
24. Selected Applications (EMF-IncQuery)
• Complex traceability
• Query driven views
• Abstract models by
derived objects
Toolchain for
IMA configs
• Connect to Matlab
Simulink model
• Export: Matlab2EMF
• Change model in EMF
• Re-import:
EMF2Matlab
MATLAB-EMF
Bridge
• Live models
(refreshed 25
frame/s)
• Complex event
processing
Gesture
recognition
• Experiments on open
source Java projects
• Local search vs.
Incremental vs.
Native Java code
Detection of
bad code smells
• Rules for operations
• Complex structural
constraints (as GP)
• Hints and guidance
• Potentially infinite
state space
Design Space
Exploration
• Itemis (developer)
• Embraer
• Thales
• ThyssenKrupp
• CERN
Known Users
26. Goals of INCQUERY-D
Objectives
o Distributed incremental pattern matching
o Adaptation of EMF-INCQUERY’s tooling to graph DBs
o Executed over cloud infrastructure (COTS hardware)
Achieve scalability by avoiding memory bottleneck
o Sharding separately
• Data
• Indexers
• Query network
o In memory:
• Index + Query
Assumptions
• All Rete nodes fit on a server node
• Indexers can be filled efficiently
• Modification size ≪ model size
• The application requires the complete result
set of the query (opposed to just one match)
27. Dimensions of Scalability
Infrastructure
o Number of machines
o Available memory / CPU
o Network performance
o Number of concurrent users
Model
o Model size
o Model characteristics
Queries
o Number of queries
o Query complexity
Metrics
28. From EMF-INCQUERY to INCQUERY-D
EMF-INCQUERY
Transaction
Rete net
Indexer
layer
In-memory
EMF model
Production network
• Stores intermediate query results
• Propagates changes
Indexing
In-memory storage
29. Database
shard 0
INCQUERY-D Architecture
Database
shard 1
Server 1
Database
shard 2
Server 2
Database
shard 3
Server 3
Transaction
Server 0
Rete net
Indexer
layer
INCQUERY-D
Distributed query evaluation network
Distributed indexer Model access adapter
Distributed indexing,
notification
Distributed persistent
storage
Distributed production network
• Each intermediate node can be allocated
to a different host
• Remote internode communication
30. INCQUERY-D Architecture
Join
Database
shard 1
Server 1
Join
Database
shard 2
Server 2
Triple store (4store),
Document DB (Mongo),
RDF over Column family
(Cumulus)
Database
shard 3
Server 3
Transaction
Database
shard 0
In-memory
EMF model
Server 0
Antijoin
Indexer
layer
INCQUERY-D
Akka
Indexer Indexer Indexer Indexer
31. Termination Protocol in INCQUERY-D
Database
shard 0
When a production node reached
an ACK message is sent back Stack added to each update msg
Database
shard 1
Server 1
• Registers the Rete nodes the
message passes through
Database
shard 2
Server 2
User retrieves
query result
Database
shard 3
Server 3
Transaction
Server 0
INCQUERY-D
Join
Join
Antijoin
Indexer Indexer Indexer Indexer
32. IncQuery-D Architectural Layers
• Gremlin, Cypher
• SPARQL
• IQPL (IncQuery)
High-Level
Query Lang
• Distributed Indexers
(MONDIX)
• SPARQL
Low-Level
Query Lang
• Cayley
• Titan
• 4store
Distributed
Graph DB
• MongoDB
• Cassandra
• 4store
Native
Storage
• RDF
• XMI / Ecore
• Property Graphs
Storage
Format
• Global queries
• Complex navigations
• Efficient element access by indices
• Local queries
• Can be transparent (via indexers)
• Integrates popular graph storages
• Efficient NoSQL storages
• Triple stores
• Standardized data formats
• Popular interchange formats
33. Summary: Key Components of IncQuery-D
Distributed
Model Storage
• Adaptable to
different back-end
storages
• Agnostic to
graph repres.
• TripleStores
(RDF), EMF,
Property
graph
Model Access
Adapter
• Surrogate key
to identify
distibuted
elements
• Graph manip.
API
• Change
notifications
Distributed
Indexer
• Type-instance
indices, etc.
• Stored on
multiple
servers
• Protects
exceeding
memory limits
Distributed
Query Evaluator
• Distributed
RETE network
• Distributed
termination
protocol
• Constructed
and deployed
by coordinator
node
Decouple and separately distribute Storage, Indexer and Query layers
35. Load
Model
(1) Loading a Query
Update
Model
Request
Result
Deploy
RETE
RETE
Network
Allocate
RETE
Cloud
Infra-structure
Construct
RETE
Load
Query
Construct RETE
• From EMF-IncQuery specs
• Should incorporate
infrastructure constraints
Deploy RETE
• Managed by a
coordinator node
• Intelligent sharding of
RETE nodes
36. Load
Model
(2) Loading a Model
Update
Model
Request
Result
Model
shards
Deploy
RETE
RETE
Network
Allocate
RETE
Maintain
Result Set
Cloud
Infra-structure
Construct
RETE
Model
Access
Adapter
Load
Query
Load model
• Model traversal
• Init indexers
• Network
communication
37. Load
Model
(3) Updating a Model
Update
Model
Request
Result
Model
shards
Deploy
RETE
RETE
Network
Allocate
RETE
Maintain
Result Set
Cloud
Infra-structure
Construct
RETE
Model
Access
Adapter
Load
Query
Model manipulation
• Update messages
• Create / Delete
38. (4) Requesting Query Result
Load
Model
Update
Model
Request
Result
Model
shards
Deploy
RETE
RETE
Network
Allocate
RETE
Evaluate
Query
Maintain
Result Set
Cloud
Infra-structure
Construct
RETE
Model
Access
Adapter
Load
Query
Evaluate query
• Process incoming
messages
• Propagate along
RETE network
Retrieve results
• instantly
39. (5) Monitoring and Reconfiguration
Load
Model
Update
Model
Request
Result
Model
shards
Deploy
RETE
RETE
Network
Allocate
RETE
Evaluate
Query
Maintain
Result Set
Cloud
Infra-structure
Monitor & Manage
Construct
RETE
Model
Access
Adapter
Load
Query
Visualized on a
web-based dashboard
OS metrics JVM metrics Akka metrics Rete metrics
47. RETE Deployment Process
Configuration scripts for
o Deployment
o Communication
middleware
Derived by automated
code generation
o Using Eclipse technology:
EMF-IncQuery + Xtend
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
49. The Train Benchmark
Model validation workload:
o User edits the model
o Instant validation of
well-formedness constraints
o Model is repaired accordingly
Scenario:
o Load
o Check
o Edit
o Re-Check
Models:
o Randomly generated
o Close to real world instances
o Following different metrics
o Customized distributions
o Low number of violations
Queries:
o Two simple queries
(<2 objects, attributes)
o Two complex queries
(4-7 joins, negation, etc.)
o Validated match sets
Batch validation Incremental validation
Instance
model
Read Check ! Edit ReCheck
100x
50. Evaluation of distributed scalability
Extensions to previous work (single workstation)
o Generation of large instance models
o Distributed, parallel loading of models
o Distributed transformation and validation
Benchmark Distributed benchmark
Model size 1K – 13M 1K – 88M
Load method Batch Distributed, parallel
Transformation and validation Single workstation Multiple servers
51. IncremBenatcahl sgcreanpahr isoc e–nIanrciQouery-D
Load and first validation: load the graph to the databases
and execute initialize the the query
Rete net and retrieve the results
Transformation: query the graph and delete some
elements
incrementally query the graph and
delete some elements, propagate the changes
Revalidation: execute retrieve the query
results from the Rete net
Load and first
validation
GraphML Transformation Revalidation
DB shards Result set
Rete net
DB shards Result set
Rete net
52. Benchmark environment
Private cloud
Different DBMSs
Query
o The DBMS’s own query language
o IncQuery-D
SPARQL Gremlin
53. 4096
1024
256
64
16
4
1
Runtime [s]
Load and first validation
55M model: approx. 15 minutes
Rete network’s
initialization
overhead pays off
Model size [million elements]
4store IncQuery-D Titan IncQuery-D 4store
54. 256
128
64
32
16
8
4
2
1
Runtime [s]
Model modification
1. Elementary model query
2. Model modification
2 orders of
magnitude
– Query from the Rete network’s indexer
– Propagation of modifications is fast
Model size [million elements]
4store IncQuery-D Titan IncQuery-D 4store
55. 128.00
32.00
8.00
2.00
0.50
0.13
0.03
0.01
Runtime [s]
Revalidation
Different characteristics
Sub-second response
time for models with
88M elements
Model size [million elements]
memory
limit
4store IncQuery-D Titan IncQuery-D 4store
56. Benchmarking Conclusions
Memory consumption
o Single workstation: 13M model, 4 GB
o Cloud of four servers: 55M model, <4×8 GB
Runtime
o Same order of magnitude and similar characteristics to
the single workstation tool
INCQUERY-D is scalable and significantly more efficient for query
evaluation than the native query engines in 4store, Titan and Neo4j