Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Budapest University of Technology and Economics
Department of Measurement and Information Systems
Optimization of Incremen...
INCQUERY-D: DISTRIBUTED
INCREMENTAL MODEL QUERIES
Incremental Query Evaluation by RETE
 AUTOSAR well-formedness validation rule
Communication
channel
Logical signal Mappin...
Fill the input nodesFill the worker nodesRead the result setModify the modelPropagate the changes
Read the changes in the
...
Goals of IncQuery-D
 Objectives
o Distributed incremental pattern matching
o Adaptation of IncQuery tooling to graph DBs
...
Database
shard 0
INCQUERY-D Architecture
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Tra...
INCQUERY-D Architecture
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
In-memor...
RETE Deployment Process
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deploymen...
RETE Deployment Process
 Construct language-
independent constraints
 Resolution of
o syntactic sugar
o type information...
RETE Deployment Process
 Construct RETE structure
(platform independently)
 Optimizations:
o Model statistics
o Expected...
RETE Deployment Process
 Architecture model
(Cloud infrastructure)
o Virtual Machines
• Memory limits
• CPU speed
• Stora...
RETE Deployment Process
Machine Allocated Nodes
1 In1, In2, Join2
2 In3
3 In4
4 Join1, Join3
Query
Language
Query
Predicat...
RETE Deployment Process
 Configuration scripts for
o Deployment
o Communication
middleware
 Derived by automated
code ge...
ALLOCATION OPTIMIZATION IN
INCQUERY-D
Motivation for Allocation Optimization
 Considering data-intensive
systems
o Over usage of resources
o Cost of the system...
The Allocation Problem
 Inputs
 Allocation constraints
 Output: Valid allocation
 Optimization targets
500 MB
3200 MB
...
Opt. Target: Communication Minimization
1 × 1,000,000
3 × 200,000 3 × 200,000
Communication = 2,200,000
6000 MB
5000 MB
1
...
Opt. Target: Cost Minimization
500 MB
3200 MB
2400 MB600 MB
Worker node
Input nodeInput node
Production node
1 2
3
4
4000 ...
Heuristics in Optimization
Worker node
Production
node
Input node
Worker node
Input nodeInput node
Worker node
Production
...
Performance Impact of Optimization
61K 213K 867K 3M 13M
Model size (number of elements)
Time(sec)
First evaluation time of...
Network Traffic Statistics
300
349 371
1020
248 280
347
875
14
2
74
90
24
20
190
234
0
200
400
600
800
1000
1200
vm0 vm1 v...
Conclusion and Future Work
 Results
o Novel approach for application-specific resource allocation optimization for
distri...
New INCQUERY-D Architecture
Docker container 1
Database
shard 1
Docker container 2
Database
shard 2
Docker container 3
Dat...
Prochain SlideShare
Chargement dans…5
×

sur

Optimization of Incremental Queries CloudMDE2015 Slide 1 Optimization of Incremental Queries CloudMDE2015 Slide 2 Optimization of Incremental Queries CloudMDE2015 Slide 3 Optimization of Incremental Queries CloudMDE2015 Slide 4 Optimization of Incremental Queries CloudMDE2015 Slide 5 Optimization of Incremental Queries CloudMDE2015 Slide 6 Optimization of Incremental Queries CloudMDE2015 Slide 7 Optimization of Incremental Queries CloudMDE2015 Slide 8 Optimization of Incremental Queries CloudMDE2015 Slide 9 Optimization of Incremental Queries CloudMDE2015 Slide 10 Optimization of Incremental Queries CloudMDE2015 Slide 11 Optimization of Incremental Queries CloudMDE2015 Slide 12 Optimization of Incremental Queries CloudMDE2015 Slide 13 Optimization of Incremental Queries CloudMDE2015 Slide 14 Optimization of Incremental Queries CloudMDE2015 Slide 15 Optimization of Incremental Queries CloudMDE2015 Slide 16 Optimization of Incremental Queries CloudMDE2015 Slide 17 Optimization of Incremental Queries CloudMDE2015 Slide 18 Optimization of Incremental Queries CloudMDE2015 Slide 19 Optimization of Incremental Queries CloudMDE2015 Slide 20 Optimization of Incremental Queries CloudMDE2015 Slide 21 Optimization of Incremental Queries CloudMDE2015 Slide 22 Optimization of Incremental Queries CloudMDE2015 Slide 23
Prochain SlideShare
Code Generation as a Service
Suivant
Télécharger pour lire hors ligne et voir en mode plein écran

0 j’aime

Partager

Télécharger pour lire hors ligne

Optimization of Incremental Queries CloudMDE2015

Télécharger pour lire hors ligne

Presentation for research paper "Optimization of Incremental Queries" published for CloudMDE2015 conference

Livres associés

Gratuit avec un essai de 30 jours de Scribd

Tout voir

Livres audio associés

Gratuit avec un essai de 30 jours de Scribd

Tout voir
  • Soyez le premier à aimer ceci

Optimization of Incremental Queries CloudMDE2015

  1. 1. Budapest University of Technology and Economics Department of Measurement and Information Systems Optimization of Incremental Queries in the Cloud József Makai, Gábor Szárnyas, Ákos Horváth, István Ráth, Dániel Varró Budapest University of Technology and Economics Fault Tolerant Systems Research Group
  2. 2. INCQUERY-D: DISTRIBUTED INCREMENTAL MODEL QUERIES
  3. 3. Incremental Query Evaluation by RETE  AUTOSAR well-formedness validation rule Communication channel Logical signal Mapping Physical signal Invalid model fragment  Instance model Valid model fragment
  4. 4. Fill the input nodesFill the worker nodesRead the result setModify the modelPropagate the changes Read the changes in the result set (deltas) Incremental Query Evaluation by RETE join join antijoin Result set Communication channel Logical signal Mapping Physical signal
  5. 5. Goals of IncQuery-D  Objectives o Distributed incremental pattern matching o Adaptation of IncQuery tooling to graph DBs o Executed over cloud infrastructure (COTS hardware)  Achieve scalability by avoiding memory bottleneck o Sharding separately • Data • Indexers • Query network o In memory: • Index + Query Assumptions • All Rete nodes fit on a server node • Indexers can be filled efficiently • Modification size ≪ model size • The application requires the complete result set of the query (opposed to just one match)
  6. 6. Database shard 0 INCQUERY-D Architecture Server 1 Database shard 1 Server 2 Database shard 2 Server 3 Database shard 3 Transaction Server 0 Rete net Indexer layer INCQUERY-D Distributed query evaluation network Distributed indexer Model access adapter Distributed indexing, notification Distributed persistent storage Distributed production network • Each intermediate node can be allocated to a different host • Remote internode communication
  7. 7. INCQUERY-D Architecture Server 1 Database shard 1 Server 2 Database shard 2 Server 3 Database shard 3 Transaction In-memory EMF model Database shard 0 Server 0 Indexer layer INCQUERY-D Indexer Indexer Indexer Indexer Join Join Antijoin Akka Triple store (4store), Document DB (Mongo), RDF over Column family (Cumulus)
  8. 8. RETE Deployment Process Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor pattern routeSensor(sensor: Sensor) = { TrackElement.sensor(switch,sensor); Switch(switch); SwitchPosition. switch(sp, switch); SwitchPosition(sp); Route.switchPosition(route, sp); Route(route); neg find head(route, sensor); } pattern head(R, Sen) = { Route.routeDefinition(R, Sen); } route: Route sp: SwitchPosition Switch:Switchsensor:Sensor switchPosition switch sensor routeDefinition
  9. 9. RETE Deployment Process  Construct language- independent constraints  Resolution of o syntactic sugar o type information Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor Variables route sp switch Parameter sensor Constraints Edge: SwitchPosition.switch Edge: TrackElement.sensor Edge: Route.switchPosition Negation: head
  10. 10. RETE Deployment Process  Construct RETE structure (platform independently)  Optimizations: o Model statistics o Expected usage profile Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor join join join
  11. 11. RETE Deployment Process  Architecture model (Cloud infrastructure) o Virtual Machines • Memory limits • CPU speed • Storage capacity o Communication Channels • Bandwidth  Specified by a textual DSL (Xtext) Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor 1 2 3 4
  12. 12. RETE Deployment Process Machine Allocated Nodes 1 In1, In2, Join2 2 In3 3 In4 4 Join1, Join3 Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor 1 2 3 4 Join1 Join3 Join2 In1 In2 In3 In4 Allocation can be optimized for query performance and other beneficial system characteristics!
  13. 13. RETE Deployment Process  Configuration scripts for o Deployment o Communication middleware  Derived by automated code generation o Using Eclipse technology: EMF-IncQuery + Xtend Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor
  14. 14. ALLOCATION OPTIMIZATION IN INCQUERY-D
  15. 15. Motivation for Allocation Optimization  Considering data-intensive systems o Over usage of resources o Cost of the system o Overhead of network communication Job Job t Local job execution time t’ Data transmission time is significant component in global execution time ~ Job Job Job Network links can have different capacities 4000 MB Process 2000 MB Process 500 MB Process 2400 MB $$$ Poor utilization leads to expensive system
  16. 16. The Allocation Problem  Inputs  Allocation constraints  Output: Valid allocation  Optimization targets 500 MB 3200 MB 2400 MB600 MB Worker node Input nodeInput node Production node 1 2 3 4 5000 MB6000 MB 1 2 • Rete network for the query organized to processes • Resource consumption Available infrastructure with important resource parameters
  17. 17. Opt. Target: Communication Minimization 1 × 1,000,000 3 × 200,000 3 × 200,000 Communication = 2,200,000 6000 MB 5000 MB 1 2500 MB 3200 MB 2400 MB600 MB Worker node Input nodeInput node Production node 1,000,000200,000 200,000 1 2 3 4 3 × 1,000,000 1 × 200,000 1 × 200,000 Communication = 3,400,000 5000 MB 6000 MB 1 2 Largest volume of data is sent through faster local link
  18. 18. Opt. Target: Cost Minimization 500 MB 3200 MB 2400 MB600 MB Worker node Input nodeInput node Production node 1 2 3 4 4000 MB $5 4000 MB $5 6500 MB $7 1 2 3 Cost = 10 4000 MB $5 4000 MB $5 6500 MB $7 1 2 3 Cost = 12
  19. 19. Heuristics in Optimization Worker node Production node Input node Worker node Input nodeInput node Worker node Production node Production node Worker node Model database Number of model elements ?? MB Input node Memory consumption of Rete nodes and processes 1 1 1 1 1 1 1 Memory usage of Input nodes can be estimated Communication intensity of network communication channels2 2 2 2 2 2 3 3 3 3 3 4 4
  20. 20. Performance Impact of Optimization 61K 213K 867K 3M 13M Model size (number of elements) Time(sec) First evaluation time of a complex query 28 45 72 114 182 290 463 739 Max. memory Naive optimization Communication optimization 739 616 194 144 2 minutes gain! This approach doesn’t work for larger models!
  21. 21. Network Traffic Statistics 300 349 371 1020 248 280 347 875 14 2 74 90 24 20 190 234 0 200 400 600 800 1000 1200 vm0 vm1 vm2 total vm0 vm1 vm2 total Network Traffic in Megabytes Remote Local Unoptimized Optimized  Unoptimized: o Remote Traffic: 1020 o Local Traffic: 90 o Total Traffic: 1110  Optimized: o Remote Traffic: 875 o Local Traffic: 234 o Total Traffic: 1109
  22. 22. Conclusion and Future Work  Results o Novel approach for application-specific resource allocation optimization for distributed Rete o CPLEX-based implementation for IncQuery-D o Preliminary evaluation results • Significant improvements for local resource management • Performance gains especially over slow / inhomogeneous networks • Efficient optimization execution (supported by runtime cutoff in CPLEX)  Future work o Hadoop / YARN support (new IncQuery-D developments) • Support configuration optimization for other Hadoop-based cloud apps o Static allocation  Dynamic reallocation • Take existing configuration as a starting constraint set • Optimize for changed workload conditions
  23. 23. New INCQUERY-D Architecture Docker container 1 Database shard 1 Docker container 2 Database shard 2 Docker container 3 Database shard 3 Transaction In-memory EMF model Database shard 0 Docker container 0 Indexer layer New INCQUERY-D: “Hadoop over Docker” Indexer Indexer Indexer Indexer Join Join Antijoin • YARN resource management • ZooKeeper monitoring Akka actors embedded into long- running Hadoop jobs

Presentation for research paper "Optimization of Incremental Queries" published for CloudMDE2015 conference

Vues

Nombre de vues

1 141

Sur Slideshare

0

À partir des intégrations

0

Nombre d'intégrations

670

Actions

Téléchargements

4

Partages

0

Commentaires

0

Mentions J'aime

0

×