IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineering and Deployment Challenges

Distributed Incremental
Model Queries over the Cloud:
Engineering and Deployment Challenges
Budapest University of Technology and Economics
Department of Measurement and Information Systems
Dániel Varró
Budapest University of Technology and Economics
Fault Tolerant Systems Research Group

Outline of the Talk
Motivation & Background:
• Validation of design rules
• Graph pattern matching
Incremental Model Queries:
The EMF-IncQuery framework
• Language - Execution
Distributed Incremental
Model Queries (IncQuery-D)
• Architecture -
Performance Benchmarks
• Distributed model load
• Incremental query evaluation
 Main Contributors
o István Ráth (lead)
o Ákos Horváth
o Gábor Bergmann
o Ábel Hegedüs
o Zoltán Ujhelyi
o Benedek Izsó
o Gábor Szárnyas
o Csaba Debreceni
o Dénes Harmath
o József Makai
o Dániel Stein

SCALABLE
MODEL DRIVEN ENGINEERING

Scalable MDE: The MONDO Project
Models and
Languages
• Large and
heterogeneous
• Construction
• Visualization
Queries and
Transformations
• Executed over
large models
• Incremental
• Lazy
• Parallel
Collaboration
• Offline (SVN)
• Online (Gdocs)
• Many
collaborators
• Secure access
Persistent
Storage
• Efficient
• Secure
• Interoperability
Case studies:
• validate solutions through real case studies
• guided by industrial advisory board
Prototype tools:
• open source software
• open benchmarks
Academic Partners:
• Univ. York (UK)
Univ. Autónoma Madrid (ES),
ARMINES (FR), BME (HU)
Industrial Partners:
• The Open Group (UK),
Uninova (PT), Softeam (FR),
Soft-Maint (FR), IKERLAN (ES)

MOTIVATION FOR
INCREMENTAL MODEL QUERIES

Motivation: Early validation of design rules
SystemSignalGroup design rule (from AUTOSAR)
o A SystemSignal and its group must be in the same IPdu
o Challenge: find violations quickly in large models
o New difficulties
• reverse
navigation
• complex
manual
solution
AUTOSAR:
• standardized SW architecture
of the automotive industry
• now supported by modern modeling tools
Design Rule/Well-formedness constraint:
• each valid car architecture needs to respect
• designers are immediately notified if violated
Challenge:
• >500 design rules in AUTOSAR tools
• >1 million elements in AUTOSAR models
• models constantly evolve by designers

Domain-Specific Modeling Languages
Abstract
Meta-model
Model
«type»

Validation of Well-formedness Constraints
Meta-model
Model
Query
pattern switchWOSignal(sw) {
Switch(sw);
neg find switchHasSignal(sw);
}
pattern switchHasSignal(sw) {
Switch(sw);
Signal(sig);
Signal.mountedTo(sig, sw);
}
Modify
User
Result

Model sizes in practice
 Models with 10M+ elements are common:
o Car industry
o Avionics
o Source code analysis
 Models evolve and change continuously
Validation can take hours
Application Model size
System models 108
Sensor data 109
Geospatial models 1012
Source: Markus Scheidgen, How Big are Models – An Estimation, 2012.

MODEL QUERIES AND
GRAPH PATTERN MATCHING

What is a model query?
 For a programmer:
o A piece of code that searches for parts of the model
 For the scientist:
o Query = set of constraints that have to be satisfied by
(parts of) the (graph) model
o Result = set of model element tuples that satisfy the
constraints of the query
o Match = bind constraint variables to model elements
 A query engine: Supports
o the definition&execution
of model queries
Query(A,B)  ∧condi(Ai,Bi)
• all tuples of model elements a,b
• satisfying the query condition
• along the match A=a and B=b
• parameters A,B can be input/ output

Graph Pattern Matching for Queries
route: Route sp: SwitchPosition
routeDefinition
sensor: Sensor switch: Switch
 Match:
o m: L  G
(graph morphism)
o CSP:
• Variables: Nodes of L
• Constraints: Edges of L
• Domain values: G
o Complexity: |G|^|L|
L
G
straight
left
switchPosition
switch
sensor
All sensors with a switch that belongs to a route must directly be linked to the same route.

Graph Pattern Matching (Local Search)
switchPosition
switch
routeDefinition
sensor
 Search Plan:
o Select the first node
to be matched
o Define an ordering on
graph pattern edges
 Search is restarted from
scratch each time
1
2
0
3
4
straight
left

Incremental Graph Pattern Matching
switchPosition
switch
routeDefinition
sensor
 Main idea: More space to less time
o Cache matches of patterns
o Instantly retrieve match (if valid)
o Update caches upon model changes
o Notify about relevant changes
 Approaches:
o TREAT, LEAPS, RETE, …
o Tools: VIATRA, GROOVE, MoTE, TCore
straight
left
route sp switch sensor
r1 sp1 sw1

Batch vs. Live Query Scenarios
 Batch query
(pull / request-driven):
1. Designer selects a query
2. One/All matches are
calculated
3. Rule is applied on one/all
matches
4. All Steps 1-3 are redone if
model changes
 Query results obtained
upon designer demand
 Live query
(push / event-driven):
1. Model is loaded
2. Rule system is loaded
3. Calculate full match set
4. Model is changed (rules
fired or designer updates)
5. Iterate Steps 3 and 4 until
rule system is stopped
 Query results are always
available for designer

INCREMENTAL MODEL QUERIES:
THE EMF-INCQUERY PROJECT

EMF-IncQuery: An Open Source Eclipse Project
• Declarative graph query
language
• Transitive closure,
Negative cond., etc.
• Compositional, reusable
Definition
• Incremental evaluation
• Cache result set
• Maintain incrementally
upon model change
Execution
• Derived features,
• On-the-fly validation
• View generation,
Notifications, Soft links,
Databinding,
Features
http://eclipse.org/incquery

The IncQuery (IQ) Graph Query Language
routeDefinition
sensor: Sensor Switch: Switch
 IQ: declarative query language
o Attribute constraints
o Local + global queries
o Compositionality+Reusabilility
o Recursion, Negation,
o Transitive Closure
o Syntax: DATALOG style
pattern routeSensor(sensor: Sensor) = {
TrackElement.sensor(switch,sensor);
Switch(switch);
SwitchPosition. switch(sp, switch);
SwitchPosition(sp);
Route.switchPosition(route, sp);
Route(route);
neg find head(route, sensor);
}
pattern head(R, Sen) = {
Route.routeDefinition(R, Sen);
}
ModelQuery(A,B):
• tuples of model elements A, B
• satisfying the query condition
• enumerate 1 / all instances
• A,B can be input or output
switchPosition
switch
sensor

TOOL DEMO: INCQUERY Development Tools
Query Explorer
Pattern Editor
Queries are applied &
updates on-the-fly
• Works with most EMF
editors out-of-the-box
• Reveals matches as
selection

Incremental Query Evaluation by RETE
 AUTOSAR well-formedness validation rule
Communication
channel
Logical signal Mapping Physical signal
 Instance model
Invalid model fragment
Valid model fragment

Incremental Query Evaluation by RETE
Read the changes in the
PFrFRioMlileplltoaahtdghea ietftwhyeineottphrrhkueeetsm rucnhnlootaoddsndeegeltsess
result set (deltas)
join
join
antijoin
Result set
Communication
channel
Logical signal Mapping Physical signal

Performance of EMF-INCQUERY
 Incremental graph queries based on Rete
 Built for the Eclipse Modeling Framework
model size
runtime
batch
queries
incremental
queries
Runtime is proportional to
the size of the modification.

Performance of EMF-INCQUERY
Storing partial
memory results
consumption
incremental
queries
batch
queries
memory
limit
model size

Selected Applications (EMF-IncQuery)
• Complex traceability
• Query driven views
• Abstract models by
derived objects
Toolchain for
IMA configs
• Connect to Matlab
Simulink model
• Export: Matlab2EMF
• Change model in EMF
• Re-import:
EMF2Matlab
MATLAB-EMF
Bridge
• Live models
(refreshed 25
frame/s)
• Complex event
processing
Gesture
recognition
• Experiments on open
source Java projects
• Local search vs.
Incremental vs.
Native Java code
Detection of
bad code smells
• Rules for operations
• Complex structural
constraints (as GP)
• Hints and guidance
• Potentially infinite
state space
Design Space
Exploration
• Itemis (developer)
• Embraer
• Thales
• ThyssenKrupp
• CERN
Known Users

INCQUERY-D: DISTRIBUTED
INCREMENTAL MODEL QUERIES

Goals of INCQUERY-D
 Objectives
o Distributed incremental pattern matching
o Adaptation of EMF-INCQUERY’s tooling to graph DBs
o Executed over cloud infrastructure (COTS hardware)
 Achieve scalability by avoiding memory bottleneck
o Sharding separately
• Data
• Indexers
• Query network
o In memory:
• Index + Query
Assumptions
• All Rete nodes fit on a server node
• Indexers can be filled efficiently
• Modification size ≪ model size
• The application requires the complete result
set of the query (opposed to just one match)

Dimensions of Scalability
 Infrastructure
o Number of machines
o Available memory / CPU
o Network performance
o Number of concurrent users
 Model
o Model size
o Model characteristics
 Queries
o Number of queries
o Query complexity
Metrics

From EMF-INCQUERY to INCQUERY-D
EMF-INCQUERY
Transaction
Rete net
Indexer
layer
In-memory
EMF model
Production network
• Stores intermediate query results
• Propagates changes
Indexing
In-memory storage

Database
shard 0
INCQUERY-D Architecture
Database
shard 1
Server 1
Database
shard 2
Server 2
Database
shard 3
Server 3
Transaction
Server 0
Rete net
Indexer
layer
INCQUERY-D
Distributed query evaluation network
Distributed indexer Model access adapter
Distributed indexing,
notification
Distributed persistent
storage
Distributed production network
• Each intermediate node can be allocated
to a different host
• Remote internode communication

INCQUERY-D Architecture
Join
Database
shard 1
Server 1
Join
Database
shard 2
Server 2
Triple store (4store),
Document DB (Mongo),
RDF over Column family
(Cumulus)
Database
shard 3
Server 3
Transaction
Database
shard 0
In-memory
EMF model
Server 0
Antijoin
Indexer
layer
INCQUERY-D
Akka
Indexer Indexer Indexer Indexer

Termination Protocol in INCQUERY-D
Database
shard 0
When a production node reached
an ACK message is sent back Stack added to each update msg
Database
shard 1
Server 1
• Registers the Rete nodes the
message passes through
Database
shard 2
Server 2
User retrieves
query result
Database
shard 3
Server 3
Transaction
Server 0
INCQUERY-D
Join
Join
Antijoin
Indexer Indexer Indexer Indexer

IncQuery-D Architectural Layers
• Gremlin, Cypher
• SPARQL
• IQPL (IncQuery)
High-Level
Query Lang
• Distributed Indexers
(MONDIX)
• SPARQL
Low-Level
Query Lang
• Cayley
• Titan
• 4store
Distributed
Graph DB
• MongoDB
• Cassandra
• 4store
Native
Storage
• RDF
• XMI / Ecore
• Property Graphs
Storage
Format
• Global queries
• Complex navigations
• Efficient element access by indices
• Local queries
• Can be transparent (via indexers)
• Integrates popular graph storages
• Efficient NoSQL storages
• Triple stores
• Standardized data formats
• Popular interchange formats

Summary: Key Components of IncQuery-D
Distributed
Model Storage
• Adaptable to
different back-end
storages
• Agnostic to
graph repres.
• TripleStores
(RDF), EMF,
Property
graph
Model Access
Adapter
• Surrogate key
to identify
distibuted
elements
• Graph manip.
API
• Change
notifications
Distributed
Indexer
• Type-instance
indices, etc.
• Stored on
multiple
servers
• Protects
exceeding
memory limits
Distributed
Query Evaluator
• Distributed
RETE network
• Distributed
termination
protocol
• Constructed
and deployed
by coordinator
node
Decouple and separately distribute Storage, Indexer and Query layers

Load
Model
(1) Loading a Query
Update
Model
Request
Result
Deploy
RETE
RETE
Network
Allocate
RETE
Cloud
Infra-structure
Construct
RETE
Load
Query
Construct RETE
• From EMF-IncQuery specs
• Should incorporate
infrastructure constraints
Deploy RETE
• Managed by a
coordinator node
• Intelligent sharding of
RETE nodes

Load
Model
(2) Loading a Model
Update
Model
Request
Result
Model
shards
Deploy
RETE
RETE
Network
Allocate
RETE
Maintain
Result Set
Cloud
Infra-structure
Construct
RETE
Model
Access
Adapter
Load
Query
Load model
• Model traversal
• Init indexers
• Network
communication

Load
Model
(3) Updating a Model
Update
Model
Request
Result
Model
shards
Deploy
RETE
RETE
Network
Allocate
RETE
Maintain
Result Set
Cloud
Infra-structure
Construct
RETE
Model
Access
Adapter
Load
Query
Model manipulation
• Update messages
• Create / Delete

(4) Requesting Query Result
Load
Model
Update
Model
Request
Result
Model
shards
Deploy
RETE
RETE
Network
Allocate
RETE
Evaluate
Query
Maintain
Result Set
Cloud
Infra-structure
Construct
RETE
Model
Access
Adapter
Load
Query
Evaluate query
• Process incoming
messages
• Propagate along
RETE network
Retrieve results
• instantly

(5) Monitoring and Reconfiguration
Load
Model
Update
Model
Request
Result
Model
shards
Deploy
RETE
RETE
Network
Allocate
RETE
Evaluate
Query
Maintain
Result Set
Cloud
Infra-structure
Monitor & Manage
Construct
RETE
Model
Access
Adapter
Load
Query
Visualized on a
web-based dashboard
OS metrics JVM metrics Akka metrics Rete metrics

DEPLOYMENT PROCESS FOR
DISTRIBUTED RETE

RETE Deployment Process
Query
Language
Query
Predicates
RETE
Structure
routeDefinition
sensor: Sensor Switch: Switch
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
pattern routeSensor(sensor: Sensor) = {
TrackElement.sensor(switch,sensor);
Switch(switch);
SwitchPosition. switch(sp, switch);
SwitchPosition(sp);
Route.switchPosition(route, sp);
Route(route);
neg find head(route, sensor);
}
pattern head(R, Sen) = {
Route.routeDefinition(R, Sen);
}
switchPosition
switch
sensor

Tooling: RDF Pattern Language
import "http://www.semanticweb.org/ontologies/2011/1/TrainRequirementOntology.owl"
pattern posLength(Segment, SegmentLength) {
Segment(Segment);
Segment.Segment_length(Segment, SegmentLength);
check(SegmentLength <= 0);
} EMF-IncQuery syntax
Vocabulary <railway.rdf>
base <http://www.semanticweb.org/ontologies/2011/1/TrainRequirementOntology.owl#>
pattern posLength(Segment, SegmentLength) {
Segment(Segment);
Segment_length(Segment, SegmentLength);
check("SegmentLength <= 0");
}
segment: Segment
segment.length 0
RDF-IncQuery syntax
Xbase (compiles to Java)
Javascript

 Construct language-independent
constraints
 Resolution of
o syntactic sugar
o type information
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
Variables route sp switch
Parameter sensor
Edge: SwitchPosition.switch
Edge: TrackElement.sensor
Edge: Route.switchPosition
Negation: head
Constraints

 Construct RETE structure
(platform independently)
 Optimizations:
o Model statistics
o Expected usage profile
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
join
join
join

 Architecture model
(Cloud infrastructure)
o Virtual Machines
• Memory limits
• CPU speed
• Storage capacity
o Communication Channels
• Bandwidth
 Specified by a textual DSL
(Xtext)
Query
Language
Query
Predicates
RETE
Structure
1 2
3 4
Platform
Description
Allocation /
Mapping
Deployment
Descriptor

Machine Allocated Nodes
1 In1, In2, Join2
2 In3
3 In4
4 Join1, Join3
Query
Language
Query
Predicates
RETE
Structure
1 2
3 4
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
Join1
Join3
Join2
In1 In2 In3 In4

 Configuration scripts for
o Deployment
o Communication
middleware
 Derived by automated
code generation
o Using Eclipse technology:
EMF-IncQuery + Xtend
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deployment
Descriptor

DISTRIBUTED
PERFORMANCE BENCHMARKS

The Train Benchmark
 Model validation workload:
o User edits the model
o Instant validation of
well-formedness constraints
o Model is repaired accordingly
 Scenario:
o Load
o Check
o Edit
o Re-Check
 Models:
o Randomly generated
o Close to real world instances
o Following different metrics
o Customized distributions
o Low number of violations
 Queries:
o Two simple queries
(<2 objects, attributes)
o Two complex queries
(4-7 joins, negation, etc.)
o Validated match sets
Batch validation Incremental validation
Instance
model
Read Check ! Edit  ReCheck 
100x

Evaluation of distributed scalability
 Extensions to previous work (single workstation)
o Generation of large instance models
o Distributed, parallel loading of models
o Distributed transformation and validation
Benchmark Distributed benchmark
Model size 1K – 13M 1K – 88M
Load method Batch Distributed, parallel
Transformation and validation Single workstation Multiple servers

IncremBenatcahl sgcreanpahr isoc e–nIanrciQouery-D
 Load and first validation: load the graph to the databases
and execute initialize the the query
Rete net and retrieve the results
 Transformation: query the graph and delete some
elements
incrementally query the graph and
delete some elements, propagate the changes
 Revalidation: execute retrieve the query
results from the Rete net
Load and first
validation
GraphML Transformation Revalidation
DB shards Result set
Rete net
DB shards Result set
Rete net

Benchmark environment
 Private cloud
 Different DBMSs
 Query
o The DBMS’s own query language
o IncQuery-D
SPARQL Gremlin

4096
1024
256
64
16
4
1
Runtime [s]
Load and first validation
55M model: approx. 15 minutes
Rete network’s
initialization
overhead pays off
Model size [million elements]
4store IncQuery-D Titan IncQuery-D 4store

256
128
64
32
16
8
4
2
1
Runtime [s]
Model modification
1. Elementary model query
2. Model modification
2 orders of
magnitude
– Query from the Rete network’s indexer
– Propagation of modifications is fast

128.00
32.00
8.00
2.00
0.50
0.13
0.03
0.01
Runtime [s]
Revalidation
Different characteristics
Sub-second response
time for models with
88M elements
memory
limit

Benchmarking Conclusions
 Memory consumption
o Single workstation: 13M model, 4 GB
o Cloud of four servers: 55M model, <4×8 GB
 Runtime
o Same order of magnitude and similar characteristics to
the single workstation tool
INCQUERY-D is scalable and significantly more efficient for query
evaluation than the native query engines in 4store, Titan and Neo4j

IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineering and Deployment Challenges

Recommandé

Recommandé

Contenu connexe

Similaire à IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineering and Deployment Challenges

Similaire à IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineering and Deployment Challenges (20)

Dernier

Dernier (20)

IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineering and Deployment Challenges

Notes de l'éditeur