SERENE 2014 School: Incremental Model Queries over the Cloud
1. Distributed Incremental
Model Queries over the Cloud
Budapest University of Technology and Economics
Department of Measurement and Information Systems
Dániel Varró
Budapest University of Technology and Economics
Fault Tolerant Systems Research Group
2. Outline of the Talk
Motivation & Background:
• Change detection in CPS
•Design Space Exploration
Incremental Model Queries:
The EMF-IncQuery framework
• Language - Execution
Distributed Incremental
Model Queries (IncQuery-D)
•Architecture -
Performance Benchmarks
•Distributed model load
• Incremental query evaluation
Main Contributors
o István Ráth (lead)
o Ákos Horváth
o Gábor Bergmann
o Ábel Hegedüs
o Zoltán Ujhelyi
o Benedek Izsó
o Gábor Szárnyas
o Csaba Debreceni
o Dénes Harmath
o József Makai
o Dániel Stein
3. Challenges for IoT / CPS
Cyber
world
Physical
world
Problem
Solution
scheme
Deployment
Service
Solution
pattern
Component
service
offering
Challenge:
Detect changes
• in system state
• in environment
Abstractions
Design space
exploration
5. Big Data Analytics for CPS
Sensors / Services Data and Event sources Cloud based apps
Data
Store
Data
Store
EvenCtlsoud based
Computation
Polling
Events
6. Challenge: Change Detection in CPS
Sensors / Services Data and Event sources Cloud based apps
Data
Store
Data
Store
EvenCtlsoud based
Computation
Polling
Events
?
7. Change Detection in CPS by Incremental Queries
Sensors / Services Data and Event sources Cloud based apps
Polling
Data
Store
Events
Data
Store
UUnniiffiieedd CChhaannggee DDeetteeccttiioonn bbyy
Distributed Incremental Queries
Cloud based
Computation
9. Motivation: Early validation of design rules
SystemSignalGroup design rule (from AUTOSAR)
o A SystemSignal and its group must be in the same IPdu
o Challenge: find violations quickly in large models
o New difficulties
• reverse
navigation
• complex
manual
solution
AUTOSAR:
• standardized SW architecture
of the automotive industry
• now supported by modern modeling tools
Design Rule/Well-formedness constraint:
• each valid car architecture needs to respect
• designers are immediately notified if violated
Challenge:
• 500 design rules in AUTOSAR tools
• 1 million elements in AUTOSAR models
• models constantly evolve by designers
11. Validation of Well-formedness Constraints
Meta-model
Model
Query
pattern switchWOSignal(sw) {
Switch(sw);
neg find switchHasSignal(sw);
}
pattern switchHasSignal(sw) {
Switch(sw);
Signal(sig);
Signal.mountedTo(sig, sw);
}
Modify
User
Result
12. Model sizes in practice
Models with 10M+ elements are common:
o Car industry
o Avionics
o Source code analysis
Models evolve and change continuously
Validation can take hours
Application Model size
System models 108
Sensor data 109
Geospatial models 1012
Source: Markus Scheidgen, How Big are Models – An Estimation, 2012.
14. What is a model query?
For a programmer:
o A piece of code that searches for parts of the model
For the scientist:
o Query = set of constraints that have to be satisfied by
(parts of) the (graph) model
o Result = set of model element tuples that satisfy the
constraints of the query
oMatch = bind constraint variables to model elements
A query engine: Supports
o the definitionexecution
of model queries
Query(A,B) ∧condi(Ai,Bi)
• all tuples of model elements a,b
• satisfying the query condition
• along the match A=a and B=b
• parameters A,B can be input/ output
15. Graph Pattern Matching for Queries
route: Route sp: SwitchPosition
routeDefinition
sensor: Sensor switch: Switch
Match:
o m: L G
(graph morphism)
o CSP:
• Variables: Nodes of L
• Constraints: Edges of L
• Domain values: G
o Complexity: |G|^|L|
L
G
straight
left
switchPosition
switch
sensor
All sensors with a switch that belongs to a route must directly be linked to the same route.
16. Graph Pattern Matching (Local Search)
switchPosition
route: Route sp: SwitchPosition
switch
routeDefinition
sensor
sensor: Sensor switch: Switch
Search Plan:
o Select the first node
to be matched
o Define an ordering on
graph pattern edges
Search is restarted from
scratch each time
1
2
0
3
4
straight
left
17. Incremental Graph Pattern Matching
switchPosition
route: Route sp: SwitchPosition
switch
routeDefinition
sensor
sensor: Sensor switch: Switch
Main idea: More space to less time
o Cache matches of patterns
o Instantly retrieve match (if valid)
o Update caches upon model changes
o Notify about relevant changes
Approaches:
o TREAT, LEAPS, RETE, …
o Tools: VIATRA, GROOVE, MoTE, TCore
straight
left
route sp switch sensor
r1 sp1 sw1
19. EMF-IncQuery: An Open Source Eclipse Project
• Declarative graph query
language
• Transitive closure,
Negative cond., etc.
• Compositional, reusable
Definition
• Incremental evaluation
• Cache result set
•Maintain incrementally
upon model change
Execution
• Derived features,
• On-the-fly validation
• View generation,
Notifications, Soft links,
Databinding,
Features
http://eclipse.org/incquery
20. The IncQuery (IQ) Graph Query Language
route: Route sp: SwitchPosition
routeDefinition
sensor: Sensor Switch: Switch
IQ: declarative query language
o Attribute constraints
o Local + global queries
o Compositionality+Reusabilility
o Recursion, Negation,
o Transitive Closure
o Syntax: DATALOG style
pattern routeSensor(sensor: Sensor) = {
TrackElement.sensor(switch,sensor);
Switch(switch);
SwitchPosition. switch(sp, switch);
SwitchPosition(sp);
Route.switchPosition(route, sp);
Route(route);
neg find head(route, sensor);
}
pattern head(R, Sen) = {
Route.routeDefinition(R, Sen);
}
ModelQuery(A,B):
• tuples of model elements A, B
• satisfying the query condition
• enumerate 1 / all instances
• A,B can be input or output
switchPosition
switch
sensor
21. Incremental Query Evaluation by RETE
AUTOSAR well-formedness validation rule
Communication
channel
Logical signal Mapping Physical signal
Instance model
Invalid model fragment
Valid model fragment
22. Incremental Query Evaluation by RETE
Read the changes in the
PFrFRioMlileplltoaahtddghea ietftwhyeineottphrrhkueeeetsm rucnhnlootaoddsndeeegeltsess
result set (deltas)
join
join
antijoin
Result set
Communication
channel
Logical signal Mapping Physical signal
23. Performance of EMF-INCQUERY
Incremental graph queries based on Rete
Built for the Eclipse Modeling Framework
model size
runtime
batch
queries
incremental
queries
Runtime is proportional to
the size of the modification.
24. Performance of EMF-INCQUERY
Storing partial
memory results
consumption
incremental
queries
batch
queries
memory
limit
model size
25. Selected Applications (EMF-IncQuery)
• Complex traceability
• Query driven views
• Abstract models by
derived objects
Toolchain for
IMA configs
• Connect to Matlab
Simulink model
• Export: Matlab2EMF
• Change model in EMF
• Re-import:
EMF2Matlab
MATLAB-EMF
Bridge
• Live models
(refreshed 25
frame/s)
• Complex event
processing
Gesture
recognition
• Experiments on open
source Java projects
• Local search vs.
Incremental vs.
Native Java code
Detection of
bad code smells
• Rules for operations
• Complex structural
constraints (as GP)
• Hints and guidance
• Potentially infinite
state space
Design Space
Exploration
• Itemis (developer)
• Embraer
• Thales
• ThyssenKrupp
• CERN
Known Users
27. Goals of INCQUERY-D
Objectives
o Distributed incremental pattern matching
o Adaptation of EMF-INCQUERY’s tooling to graph DBs
o Executed over cloud infrastructure (COTS hardware)
Achieve scalability by avoiding memory bottleneck
o Sharding separately
• Data
• Indexers
• Query network
o In memory:
• Index + Query
Assumptions
• All Rete nodes fit on a server node
• Indexers can be filled efficiently
• Modification size ≪ model size
• The application requires the complete result
set of the query (opposed to just one match)
28. Dimensions of Scalability
Infrastructure
o Number of machines
o Available memory / CPU
o Network performance
o Number of concurrent users
Model
o Model size
o Model characteristics
Queries
o Number of queries
o Query complexity
Metrics
29. INCQUERY-D Architecture
EMF-INCQUERY INCQUERY-D
Join
Database
shard 1
Server 1
Join
Database
shard 2
Server 2
Triple store (4store),
Document DB (Mongo),
RDF over Column family
Database
shard 3
Server 3
Transaction
Database
shard 0
In-memory
EMF model
Server 0
Antijoin
Rete net
Indexer
layer
Akka
Distributed query evaluation network
Distributed indexer Model access adapter Indexing Indexer Indexer Indexer Indexer
In-memory storage
Distributed indexing,
notification
Production network
• Stores intermediate query results
• Propagates changes
Distributed persistent
storage
Distributed production network
• Each intermediate node can be allocated
to a different host
• Remote internode (Cumulus)
communication
30. Termination Protocol in INCQUERY-D
Database
shard 0
When a production node reached
an ACK message is sent back Stack added to each update msg
Database
shard 1
Server 1
• Registers the Rete nodes the
message passes through
Database
shard 2
Server 2
User retrieves
query result
Database
shard 3
Server 3
Transaction
Server 0
INCQUERY-D
Join
Join
Antijoin
Indexer Indexer Indexer Indexer
31. IncQuery-D Architectural Layers
• Gremlin, Cypher
• SPARQL
• IQPL (IncQuery)
High-Level
Query Lang
• Distributed Indexers
(MONDIX)
• SPARQL
Low-Level
Query Lang
• Cayley
• Titan
• 4store
Distributed
Graph DB
• MongoDB
• Cassandra
• 4store
Native
Storage
• RDF
• XMI / Ecore
• Property Graphs
Storage
Format
• Global queries
• Complex navigations
• Efficient element access by indices
• Local queries
• Can be transparent (via indexers)
• Integrates popular graph storages
• Efficient NoSQL storages
• Triple stores
• Standardized data formats
• Popular interchange formats
32. Summary: Key Components of IncQuery-D
Distributed
Model Storage
• Adaptable to
different back-end
storages
• Agnostic to
graph repres.
• TripleStores
(RDF), EMF,
Property
graph
Model Access
Adapter
• Surrogate key
to identify
distibuted
elements
• Graph manip.
API
• Change
notifications
Distributed
Indexer
• Type-instance
indices, etc.
• Stored on
multiple
servers
• Protects
exceeding
memory limits
Distributed
Query Evaluator
• Distributed
RETE network
• Distributed
termination
protocol
• Constructed
and deployed
by coordinator
node
Decouple and separately distribute Storage, Indexer and Query layers
34. Load
Model
(1) Loading a Query
Update
Model
Request
Result
Deploy
RETE
RETE
Network
Allocate
RETE
Cloud
Infra-structure
Construct
RETE
Load
Query
Construct RETE
• From EMF-IncQuery specs
• Should incorporate
infrastructure constraints
Deploy RETE
• Managed by a
coordinator node
• Intelligent sharding of
RETE nodes
35. Load
Model
(2) Loading a Model
Update
Model
Request
Result
Model
shards
Deploy
RETE
RETE
Network
Allocate
RETE
Maintain
Result Set
Cloud
Infra-structure
Construct
RETE
Model
Access
Adapter
Load
Query
Load model
• Model traversal
• Init indexers
• Network
communication
36. Load
Model
(3) Updating a Model
Update
Model
Request
Result
Model
shards
Deploy
RETE
RETE
Network
Allocate
RETE
Maintain
Result Set
Cloud
Infra-structure
Construct
RETE
Model
Access
Adapter
Load
Query
Model manipulation
• Update messages
• Create / Delete
37. (4) Requesting Query Result
Load
Model
Update
Model
Request
Result
Model
shards
Deploy
RETE
RETE
Network
Allocate
RETE
Evaluate
Query
Maintain
Result Set
Cloud
Infra-structure
Construct
RETE
Model
Access
Adapter
Load
Query
Evaluate query
• Process incoming
messages
• Propagate along
RETE network
Retrieve results
• instantly
38. (5) Monitoring and Reconfiguration
Load
Model
Update
Model
Request
Result
Model
shards
Deploy
RETE
RETE
Network
Allocate
RETE
Evaluate
Query
Maintain
Result Set
Cloud
Infra-structure
Monitor Manage
Construct
RETE
Model
Access
Adapter
Load
Query
Visualized on a
web-based dashboard
OS metrics JVM metrics Akka metrics Rete metrics
46. RETE Deployment Process
Configuration scripts for
o Deployment
o Communication
middleware
Derived by automated
code generation
o Using Eclipse technology:
EMF-IncQuery + Xtend
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
48. The Train Benchmark
Model validation workload:
o User edits the model
o Instant validation of
well-formedness constraints
o Model is repaired accordingly
Scenario:
o Load
o Check
o Edit
o Re-Check
Models:
o Randomly generated
o Close to real world instances
o Following different metrics
o Customized distributions
o Low number of violations
Queries:
o Two simple queries
(2 objects, attributes)
o Two complex queries
(4-7 joins, negation, etc.)
o Validated match sets
Incremental Batch validation validation
Instance
model
Read Check ! Edit ReCheck
100x
49. Evaluation of distributed scalability
Extensions to previous work (single workstation)
o Generation of large instance models
o Distributed, parallel loading of models
o Distributed transformation and validation
Benchmark Distributed benchmark
Model size 1K – 13M 1K – 88M
Load method Batch Distributed, parallel
Transformation and validation Single workstation Multiple servers
50. IncremBenattcahl sgcreanpahr isoc e–nIanrciQouery-D
Load and first validation: load the graph to the databases
and execute initialize the the Rete query
net and retrieve the results
Transformation: query the incrementally graph query and the delete graph some
and
delete elements
some elements, propagate the changes
Revalidation: execute retrieve the query
results from the Rete net
Load and first
validation
GraphML Transformation Revalidation
DB shards Result set
Rete net
DB shards Result set
Rete net
51. Benchmark environment
Private cloud
Different DBMSs
Query
o The DBMS’s own query language
o IncQuery-D
SPARQL Gremlin
52. 4096
1024
256
64
16
4
1
Runtime [s]
Load and first validation
55M model: approx. 15 minutes
Rete network’s
initialization
overhead pays off
Model size [million elements]
4store IncQuery-D Titan IncQuery-D 4store
53. 256
128
64
32
16
8
4
2
1
Runtime [s]
Model modification
1. Elementary model query
2. Model modification
2 orders of
magnitude
– Query from the Rete network’s indexer
– Propagation of modifications is fast
Model size [million elements]
4store IncQuery-D Titan IncQuery-D 4store
54. 128,00
32,00
8,00
2,00
0,50
0,13
0,03
0,01
Runtime [s]
Revalidation
Different characteristics
Sub-second response
time for models with
88M elements
Model size [million elements]
memory
limit
4store IncQuery-D Titan IncQuery-D 4store
55. Benchmarking Conclusions
Memory consumption
o Single workstation: 13M model, 4 GB
o Cloud of four servers: 55M model, 4×8 GB
Runtime
o Same order of magnitude and similar characteristics to
the single workstation tool
INCQUERY-D is scalable and significantly more efficient for query
evaluation than the native query engines in 4store, Titan and Neo4j
56. Applications of Distributed Incremental Queries
• Query based optimistic locking
• Queries for Attribute Based Access
Control
Collaborative
Modeling (MONDO)
• Events vs. Changes
• Handle compound changes as events
Complex Event
Processing w/
compound changes
• System evolves along operations
• Cost / Objectives associated to
• States + Environment + Trajectory
Rule-based Design
Space Exploration
58. TRANS-IMA Project (Avionics)
Goal: Allocate SW components to
ARINC653 compliant IMA platform
58
Functional
Architecture
Component
database
Platform
description
Allocation
Integrated
System
Model
Inputs:
• Platform Independent Model (PIM)
(functional + nonfunc. reqs; Simulink)
• Platform Description Model (PDM)
for ARINC 653 (DSL)
Output:
• Integrated system model
• Ready for simulation
• End-to-end traceability
59. Designing ARINC653 configurations
(critical + non-critical)
Supply fresh air
Supply hot air
Monitor
temperature
Set
temperature
SW functionality
Pack
Controller
Zone
Controller
3
System
Display
AirCond
Panel
3
Redundancy
requirement
60. Job instances, Partitions, Modules
SW functionality
(critical + non-critical)
Pack
Controller
Zone
Controller
3
System
Display
AirCond
Panel
3
Job instances
1
2 3
4
5 6
7
8
Partitions
Modules
Constraints
2
5
3
4
8
8
8
8
Memory needs
+ constraints
Do not mix critical
and non-crit. jobs
Do not mix
instances of the
same critical job
Additional constraints
• WCET,
• scheduling, etc.
• interfaces
• datatypes
61. Allocating communication channels
SW functionality
Pack
Controller
Zone
Controller
3
System
Display
AirCond
Panel
3
1
2
3
7
4
5
6
8
Communication
channels
Humidity
Pressure
Temperature
62. Design Space Exploration
Design Space Exploration
62
Design
Alternative 1
Design
Alternative 2
Design
Alternative 3
Design
Alternative 4
Objectives
Global
Constraints
Initial Design
Solvers
• CLP solvers (Choco)
• model finders (Alloy)
• meta-heuristics +
multi-objective optimization
63. Design Space Exploration (Example)
63
Consistency Analysis
Design
Alternative 1
Design
Alternative 2
Design
Alternative 3
Design
Alternative 4
Objectives
Global
Constraints
Initial Design
A
B
x=2
C
A
A A
B
x=?
C
I1 I2
64. Design Space Exploration (Example)
64
+ Filled
Attributes
Consistency Analysis
Design
Alternative 1
Design
Alternative 2
Design
Alternative 3
Design
Alternative 4
Objectives
Global
Constraints
Initial Design
A
B
x=2
C
x=?
A
B
x=5
C
C
C
A
C
O
C1
C2
+ Objects
I1 I2
+ Relations
65. Rule Based Design Space Exploration
Design Space Exploration
65
Design
Alternative 1
Design
Alternative 2
Design
Alternative 3
Design
Alternative 4
Objectives
Global
Constraints
Operations
Initial Design
Special state space exploration
• potentially infinite state space
• „dense” solution space
66. Rule-Based Guided Design Space Exploration
Design Space Exploration
66
Seq of Transf.
Rules 1
Seq of Transf.
Rules 2
Seq of Transf.
Rules 3
Seq of Transf.
Rules 4
Model queries
as Objectives
Model queries
as Constraints
Transf. Rules
as Operations
Initial
Model
Guidance for exploration: Hints
• designer / end user
• formal analysis