Cluster-Wide Context Switch of Virtualized Jobs 
Fabien Hermenier, Adrien Lèbre, Jean-Marc Menaud 
VTDC’10, 22 June 2010 
...
Agenda 
Motivation 
Global Design 
Architecture 
Implementation 
Proof of concept 
A sample scheduler 
Experiment on a clu...
Motivation 
Agenda 
Motivation 
Global Design 
Architecture 
Implementation 
Proof of concept 
A sample scheduler 
Experim...
Motivation 
Motivation 
Clusters 
I large infrastructures to execute various jobs 
Resource Management System (RMS) 
I man...
Motivation 
Jobs schedulers 
Usually 
A corse-grain exploitation of resources : 
I static allocation of resources 
I execu...
Motivation 
Motivation 
Virtual Machines (VMs) as a backend for dynamic schedulers 
I each component is embedded into its ...
Motivation 
Proposition 
Performing the changes should not be a primary concern for developers 
I a generic cluster-wide c...
Global Design 
Agenda 
Motivation 
Global Design 
Architecture 
Implementation 
Proof of concept 
A sample scheduler 
Expe...
Global Design Architecture 
From jobs to virtualized Jobs 
Figure: The life cycle of a vjob 
I a vjob encapsulates one or ...
Global Design Architecture 
Configuration 
I describes the assignment of the running VMs to working nodes 
I nodes provide...
Global Design Architecture 
The control loop of Entropy 
Monitor 
I extract the current configuration: 
VM position, CPU/m...
Global Design Architecture 
The control loop of Entropy 
Scheduling policy 
I an algorithm to select the vjobs to run wrt....
Global Design Architecture 
The control loop of Entropy 
The cluster-wide context switch module 
I selects a position for ...
Global Design Architecture 
The control loop of Entropy 
Execution 
I associate each action of the plan with a driver that...
Global Design Implementation 
Role of the CW context switch 
I detects the actions to perform 
I selects a position for ea...
Global Design Implementation 
Plan the actions 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs ...
Global Design Implementation 
Plan the actions 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs ...
Global Design Implementation 
Plan the actions 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs ...
Global Design Implementation 
Plan the actions 
The reconfiguration plan 
I a protocol to execute actions 
I actions feasi...
Global Design Implementation 
Suspending/Resuming a vjob 
I inter-connected VMs should be continuously in the same state 
...
Global Design Implementation 
Suspending/Resuming a vjob 
I inter-connected VMs should be continuously in the same state 
...
Global Design Implementation 
Reducing the duration of a cluster-wide context switch 
45 
40 
35 
30 
25 
20 
15 
10 
5 
1...
Global Design Implementation 
Reducing the duration of a CW context switch 
An approach based on constraint programing 
En...
Proof of concept 
Agenda 
Motivation 
Global Design 
Architecture 
Implementation 
Proof of concept 
A sample scheduler 
E...
Proof of concept A sample scheduler 
A sample scheduler 
Principle 
I a FIFO queue 
I VMs are assigned to nodes using a Fi...
Proof of concept A sample scheduler 
A sample scheduler 
Principle 
I a FIFO queue 
I VMs are assigned to nodes using a Fi...
Proof of concept A sample scheduler 
A sample scheduler 
Principle 
I a FIFO queue 
I VMs are assigned to nodes using a Fi...
Proof of concept Experiment on a cluster 
Environment 
Hardware 
I 11 working nodes 
I 3 storage nodes share VM images 
I ...
Proof of concept Experiment on a cluster 
Experiment on a cluster 
Benefits 
I improve resource usage 
I suspend/resume tr...
Proof of concept Experiment on a cluster 
Experiment on a cluster 
Benefits 
I improve resource usage 
I suspend/resume tr...
Conclusion 
Agenda 
Motivation 
Global Design 
Architecture 
Implementation 
Proof of concept 
A sample scheduler 
Experim...
Conclusion 
Conclusion 
RMSs start to manage VMs instead of process 
I VMMs provide mechanisms to implement dynamic schedu...
Conclusion 
I’m looking for a postdoc 
position 
I fond of - virtualization, distributed systems, autonomic computing, . ....
Prochain SlideShare
Chargement dans…5
×

Cluster-Wide Context Switch of Virtualized Jobs

361 vues

Publié le

Fabien Hermenier, Adrien Lèbre, and Jean-Marc Menaud.
Proceeding of the international workshop Virtualization Techniques for Distributed Computing (VTDC'10), with the 19th ACM International Symposium on High Performance Distributed Computing (HPDC'10). ACM, New York, pages 658-666.

Publié dans : Logiciels
0 commentaire
0 j’aime
Statistiques
Remarques
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Aucun téléchargement
Vues
Nombre de vues
361
Sur SlideShare
0
Issues des intégrations
0
Intégrations
2
Actions
Partages
0
Téléchargements
5
Commentaires
0
J’aime
0
Intégrations 0
Aucune incorporation

Aucune remarque pour cette diapositive

Cluster-Wide Context Switch of Virtualized Jobs

  1. 1. Cluster-Wide Context Switch of Virtualized Jobs Fabien Hermenier, Adrien Lèbre, Jean-Marc Menaud VTDC’10, 22 June 2010 ASCOLA Team Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 1 / 23
  2. 2. Agenda Motivation Global Design Architecture Implementation Proof of concept A sample scheduler Experiment on a cluster Conclusion Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 2 / 23
  3. 3. Motivation Agenda Motivation Global Design Architecture Implementation Proof of concept A sample scheduler Experiment on a cluster Conclusion Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 3 / 23
  4. 4. Motivation Motivation Clusters I large infrastructures to execute various jobs Resource Management System (RMS) I manage the execution of jobs I resources are allocated to jobs according to their description I scheduling: which jobs to execute, and where ? Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 4 / 23
  5. 5. Motivation Jobs schedulers Usually A corse-grain exploitation of resources : I static allocation of resources I execution to completion Dynamic schedulers exist Based on mechanisms that manipulate the jobs dynamically (migration, preemption, dynamic allocation of resources, . . . ). BUT I mechanisms are complex to implement I mechanisms are complex to use efficiently Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 5 / 23
  6. 6. Motivation Motivation Virtual Machines (VMs) as a backend for dynamic schedulers I each component is embedded into its VM I VMMs provide migration, preemption I still complex to use efficiently A cutting-edge building block dynamic consolidation, best-effort jobs , . . . I various policies, but common concepts to perform the changes I each provides an ad-hoc solution to handle several common issues: I dependencies between actions I correctness I reactivity Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 6 / 23
  7. 7. Motivation Proposition Performing the changes should not be a primary concern for developers I a generic cluster-wide context switch based on VMs I developers only focus on the algorithm to select the jobs to run I the cluster-wide context switch takes care of the rest I detects the changes to perform I ensures the correctness of the transition I computes the fastest possible transition The implementation leverages the consolidation manager Entropy Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 7 / 23
  8. 8. Global Design Agenda Motivation Global Design Architecture Implementation Proof of concept A sample scheduler Experiment on a cluster Conclusion Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 8 / 23
  9. 9. Global Design Architecture From jobs to virtualized Jobs Figure: The life cycle of a vjob I a vjob encapsulates one or several VMs I to change the state of a vjob, actions (except migrate) are executed on each VMs Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 9 / 23
  10. 10. Global Design Architecture Configuration I describes the assignment of the running VMs to working nodes I nodes provide CPU and memory resources I running VMs require CPU and memory resources to run at peak level (a) Non-viable configuration (b) Viable configuration Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 10 / 23
  11. 11. Global Design Architecture The control loop of Entropy Monitor I extract the current configuration: VM position, CPU/memory consumption I adaptable to a specific monitoring system (currently Ganglia) Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 11 / 23
  12. 12. Global Design Architecture The control loop of Entropy Scheduling policy I an algorithm to select the vjobs to run wrt. the current configuration I provided by a developer Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 11 / 23
  13. 13. Global Design Architecture The control loop of Entropy The cluster-wide context switch module I selects a position for each VM to run I infers the actions that make the transition w. the current configuration I computes the fastest plan that ensure the correctness of the process Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 11 / 23
  14. 14. Global Design Architecture The control loop of Entropy Execution I associate each action of the plan with a driver that performs the action I adaptable to specific environments. Currently support Xen VMM (XML-RPC) or shell command Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 11 / 23
  15. 15. Global Design Implementation Role of the CW context switch I detects the actions to perform I selects a position for each VM to run I plans the actions to guarantee the correctness of the process I computes the fastest possible plan Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 12 / 23
  16. 16. Global Design Implementation Plan the actions Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 13 / 23
  17. 17. Global Design Implementation Plan the actions Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 13 / 23
  18. 18. Global Design Implementation Plan the actions Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 13 / 23
  19. 19. Global Design Implementation Plan the actions The reconfiguration plan I a protocol to execute actions I actions feasible in parallel are grouped into a same step I steps are executed sequentially Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 13 / 23
  20. 20. Global Design Implementation Suspending/Resuming a vjob I inter-connected VMs should be continuously in the same state I coordination to ensure that distributed applications will not fail Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 14 / 23
  21. 21. Global Design Implementation Suspending/Resuming a vjob I inter-connected VMs should be continuously in the same state I coordination to ensure that distributed applications will not fail I actions are grouped into a same step I synchronization between the pause/unpause actions Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 14 / 23
  22. 22. Global Design Implementation Reducing the duration of a cluster-wide context switch 45 40 35 30 25 20 15 10 5 128 256 start/run migrate stop/shutdown 1024 2048 VM size (in MB) Completion time (in sec) 0 512 Completion time 200 150 100 50 0 128 256 (in sec) local nfs localïscp localïrsync 1024 2048 VM size (in MB) 512 I the duration of an action depends on its context I a function estimates the cost of a whole CW context switch Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 15 / 23
  23. 23. Global Design Implementation Reducing the duration of a CW context switch An approach based on constraint programing Entropy computes a new configuration that I is viable I respects the scheduling policy I implies the minimal cost In practice I actions are performed asap. I prefer moving VMs will small memory requirements I avoid migrations and remote resumes Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 16 / 23
  24. 24. Proof of concept Agenda Motivation Global Design Architecture Implementation Proof of concept A sample scheduler Experiment on a cluster Conclusion Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 17 / 23
  25. 25. Proof of concept A sample scheduler A sample scheduler Principle I a FIFO queue I VMs are assigned to nodes using a First Fit Decrease heuristic I priority between jobs to prevent starvation Example Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 18 / 23
  26. 26. Proof of concept A sample scheduler A sample scheduler Principle I a FIFO queue I VMs are assigned to nodes using a First Fit Decrease heuristic I priority between jobs to prevent starvation Example Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 18 / 23
  27. 27. Proof of concept A sample scheduler A sample scheduler Principle I a FIFO queue I VMs are assigned to nodes using a First Fit Decrease heuristic I priority between jobs to prevent starvation Benefits using CW context switch I dynamic allocation of resources I preemption I migration of VMs Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 18 / 23
  28. 28. Proof of concept Experiment on a cluster Environment Hardware I 11 working nodes I 3 storage nodes share VM images I 1 service node is running Entropy Protocol I a queue of 8 vjobs (NASGrid benchmarks) I each vjob uses 9 VMs I comparison with regards to FCFS I resources usage I completion time Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 19 / 23
  29. 29. Proof of concept Experiment on a cluster Experiment on a cluster Benefits I improve resource usage I suspend/resume transparent for the developer Resources usage Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 20 / 23
  30. 30. Proof of concept Experiment on a cluster Experiment on a cluster Benefits I improve resource usage I suspend/resume transparent for the developer I reduce the completion time Cumulated execution time I FCFS: 250 minutes I Entropy: 150 minutes Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 20 / 23
  31. 31. Conclusion Agenda Motivation Global Design Architecture Implementation Proof of concept A sample scheduler Experiment on a cluster Conclusion Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 21 / 23
  32. 32. Conclusion Conclusion RMSs start to manage VMs instead of process I VMMs provide mechanisms to implement dynamic schedulers I manipulate VMs is tedious and may be non cost-effective I various scheduling policies but common concepts to perform the context switch A generic cluster-wide context switch I make the implementation of dynamic schedulers easier I the context switch is outside the scheduling algorithm I an implementation in Entropy with a sample algorithm http://entropy.gforge.inria.fr version 1.2 (LGPL) Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 22 / 23
  33. 33. Conclusion I’m looking for a postdoc position I fond of - virtualization, distributed systems, autonomic computing, . . . I dislike - tomatoes Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 23 / 23

×