Cluster-Wide Context Switch of Virtualized Jobs

Fabien Hermenier
Fabien HermenierMaître de conférence (Associate Professor) à Université Nice Sophia Antipolis
Cluster-Wide Context Switch of Virtualized Jobs 
Fabien Hermenier, Adrien Lèbre, Jean-Marc Menaud 
VTDC’10, 22 June 2010 
ASCOLA Team 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 1 / 23
Agenda 
Motivation 
Global Design 
Architecture 
Implementation 
Proof of concept 
A sample scheduler 
Experiment on a cluster 
Conclusion 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 2 / 23
Motivation 
Agenda 
Motivation 
Global Design 
Architecture 
Implementation 
Proof of concept 
A sample scheduler 
Experiment on a cluster 
Conclusion 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 3 / 23
Motivation 
Motivation 
Clusters 
I large infrastructures to execute various jobs 
Resource Management System (RMS) 
I manage the execution of jobs 
I resources are allocated to jobs according to their description 
I scheduling: which jobs to execute, and where ? 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 4 / 23
Motivation 
Jobs schedulers 
Usually 
A corse-grain exploitation of resources : 
I static allocation of resources 
I execution to completion 
Dynamic schedulers exist 
Based on mechanisms that manipulate the jobs dynamically (migration, 
preemption, dynamic allocation of resources, . . . ). BUT 
I mechanisms are complex to implement 
I mechanisms are complex to use efficiently 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 5 / 23
Motivation 
Motivation 
Virtual Machines (VMs) as a backend for dynamic schedulers 
I each component is embedded into its VM 
I VMMs provide migration, preemption 
I still complex to use efficiently 
A cutting-edge building block 
dynamic consolidation, best-effort jobs , . . . 
I various policies, but common concepts to perform the changes 
I each provides an ad-hoc solution to handle several common issues: 
I dependencies between actions 
I correctness 
I reactivity 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 6 / 23
Motivation 
Proposition 
Performing the changes should not be a primary concern for developers 
I a generic cluster-wide context switch based on VMs 
I developers only focus on the algorithm to select the jobs to run 
I the cluster-wide context switch takes care of the rest 
I detects the changes to perform 
I ensures the correctness of the transition 
I computes the fastest possible transition 
The implementation leverages the consolidation manager Entropy 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 7 / 23
Global Design 
Agenda 
Motivation 
Global Design 
Architecture 
Implementation 
Proof of concept 
A sample scheduler 
Experiment on a cluster 
Conclusion 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 8 / 23
Global Design Architecture 
From jobs to virtualized Jobs 
Figure: The life cycle of a vjob 
I a vjob encapsulates one or several VMs 
I to change the state of a vjob, 
actions (except migrate) are executed on each VMs 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 9 / 23
Global Design Architecture 
Configuration 
I describes the assignment of the running VMs to working nodes 
I nodes provide CPU and memory resources 
I running VMs require CPU and memory resources to run at peak level 
(a) Non-viable configuration (b) Viable configuration 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 10 / 23
Global Design Architecture 
The control loop of Entropy 
Monitor 
I extract the current configuration: 
VM position, CPU/memory consumption 
I adaptable to a specific monitoring system (currently Ganglia) 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 11 / 23
Global Design Architecture 
The control loop of Entropy 
Scheduling policy 
I an algorithm to select the vjobs to run wrt. the current configuration 
I provided by a developer 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 11 / 23
Global Design Architecture 
The control loop of Entropy 
The cluster-wide context switch module 
I selects a position for each VM to run 
I infers the actions that make the transition w. the current configuration 
I computes the fastest plan that ensure the correctness of the process 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 11 / 23
Global Design Architecture 
The control loop of Entropy 
Execution 
I associate each action of the plan with a driver that performs the action 
I adaptable to specific environments. 
Currently support Xen VMM (XML-RPC) or shell command 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 11 / 23
Global Design Implementation 
Role of the CW context switch 
I detects the actions to perform 
I selects a position for each VM to run 
I plans the actions to guarantee the correctness of the process 
I computes the fastest possible plan 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 12 / 23
Global Design Implementation 
Plan the actions 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 13 / 23
Global Design Implementation 
Plan the actions 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 13 / 23
Global Design Implementation 
Plan the actions 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 13 / 23
Global Design Implementation 
Plan the actions 
The reconfiguration plan 
I a protocol to execute actions 
I actions feasible in parallel are grouped into a same step 
I steps are executed sequentially 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 13 / 23
Global Design Implementation 
Suspending/Resuming a vjob 
I inter-connected VMs should be continuously in the same state 
I coordination to ensure that distributed applications will not fail 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 14 / 23
Global Design Implementation 
Suspending/Resuming a vjob 
I inter-connected VMs should be continuously in the same state 
I coordination to ensure that distributed applications will not fail 
I actions are grouped into a same step 
I synchronization between the pause/unpause actions 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 14 / 23
Global Design Implementation 
Reducing the duration of a cluster-wide context switch 
45 
40 
35 
30 
25 
20 
15 
10 
5 
128 256 
start/run 
migrate 
stop/shutdown 
1024 2048 
VM size (in MB) 
Completion time 
(in sec) 
0 
512 
Completion time 
200 
150 
100 
50 
0 
128 256 
(in sec) 
local 
nfs 
localïscp 
localïrsync 
1024 2048 
VM size (in MB) 
512 
I the duration of an action depends on its context 
I a function estimates the cost of a whole CW context switch 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 15 / 23
Global Design Implementation 
Reducing the duration of a CW context switch 
An approach based on constraint programing 
Entropy computes a new configuration that 
I is viable 
I respects the scheduling policy 
I implies the minimal cost 
In practice 
I actions are performed asap. 
I prefer moving VMs will small memory requirements 
I avoid migrations and remote resumes 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 16 / 23
Proof of concept 
Agenda 
Motivation 
Global Design 
Architecture 
Implementation 
Proof of concept 
A sample scheduler 
Experiment on a cluster 
Conclusion 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 17 / 23
Proof of concept A sample scheduler 
A sample scheduler 
Principle 
I a FIFO queue 
I VMs are assigned to nodes using a First Fit Decrease heuristic 
I priority between jobs to prevent starvation 
Example 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 18 / 23
Proof of concept A sample scheduler 
A sample scheduler 
Principle 
I a FIFO queue 
I VMs are assigned to nodes using a First Fit Decrease heuristic 
I priority between jobs to prevent starvation 
Example 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 18 / 23
Proof of concept A sample scheduler 
A sample scheduler 
Principle 
I a FIFO queue 
I VMs are assigned to nodes using a First Fit Decrease heuristic 
I priority between jobs to prevent starvation 
Benefits using CW context switch 
I dynamic allocation of resources 
I preemption 
I migration of VMs 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 18 / 23
Proof of concept Experiment on a cluster 
Environment 
Hardware 
I 11 working nodes 
I 3 storage nodes share VM images 
I 1 service node is running Entropy 
Protocol 
I a queue of 8 vjobs (NASGrid benchmarks) 
I each vjob uses 9 VMs 
I comparison with regards to FCFS 
I resources usage 
I completion time 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 19 / 23
Proof of concept Experiment on a cluster 
Experiment on a cluster 
Benefits 
I improve resource usage 
I suspend/resume transparent for the developer 
Resources usage 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 20 / 23
Proof of concept Experiment on a cluster 
Experiment on a cluster 
Benefits 
I improve resource usage 
I suspend/resume transparent for the developer 
I reduce the completion time 
Cumulated execution time 
I FCFS: 250 minutes 
I Entropy: 150 minutes 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 20 / 23
Conclusion 
Agenda 
Motivation 
Global Design 
Architecture 
Implementation 
Proof of concept 
A sample scheduler 
Experiment on a cluster 
Conclusion 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 21 / 23
Conclusion 
Conclusion 
RMSs start to manage VMs instead of process 
I VMMs provide mechanisms to implement dynamic schedulers 
I manipulate VMs is tedious and may be non cost-effective 
I various scheduling policies but common concepts to perform the 
context switch 
A generic cluster-wide context switch 
I make the implementation of dynamic schedulers easier 
I the context switch is outside the scheduling algorithm 
I an implementation in Entropy with a sample algorithm 
http://entropy.gforge.inria.fr 
version 1.2 (LGPL) 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 22 / 23
Conclusion 
I’m looking for a postdoc 
position 
I fond of - virtualization, distributed systems, autonomic computing, . . . 
I dislike - tomatoes 
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 23 / 23
1 sur 33

Recommandé

Prakash CV par
Prakash CVPrakash CV
Prakash CVPrakash Pramanik
426 vues3 diapositives
Media Conventions par
Media ConventionsMedia Conventions
Media Conventionsgeorgering
249 vues5 diapositives
Bin repacking scheduling in virtualized datacenters par
Bin repacking scheduling in virtualized datacentersBin repacking scheduling in virtualized datacenters
Bin repacking scheduling in virtualized datacentersFabien Hermenier
664 vues25 diapositives
Quick review 2 par
Quick review 2Quick review 2
Quick review 2iloveyou06
552 vues8 diapositives
Quick review par
Quick reviewQuick review
Quick reviewiloveyou06
561 vues6 diapositives
Introduction to Google Plus par
Introduction to Google PlusIntroduction to Google Plus
Introduction to Google PlusThe Farnham Hub
348 vues22 diapositives

Contenu connexe

En vedette

Branston Adams 17th October 2014 par
Branston Adams 17th October 2014Branston Adams 17th October 2014
Branston Adams 17th October 2014The Farnham Hub
402 vues13 diapositives
The 2015 Budget by Branston Adams par
The 2015 Budget by Branston AdamsThe 2015 Budget by Branston Adams
The 2015 Budget by Branston AdamsThe Farnham Hub
236 vues14 diapositives
Outlook par
OutlookOutlook
OutlookThe Farnham Hub
310 vues12 diapositives
رزومه شرکت par
رزومه شرکترزومه شرکت
رزومه شرکتmahya asgarshahbazi
801 vues19 diapositives
Testing par
TestingTesting
Testingmardzken
163 vues3 diapositives
Final Piece par
Final PieceFinal Piece
Final Piecegeorgering
409 vues52 diapositives

En vedette(13)

Similaire à Cluster-Wide Context Switch of Virtualized Jobs

Coordination-aware Elasticity par
Coordination-aware ElasticityCoordination-aware Elasticity
Coordination-aware ElasticityHong-Linh Truong
1.2K vues16 diapositives
Resource Allocation using Virtual Clusters par
Resource Allocation using Virtual ClustersResource Allocation using Virtual Clusters
Resource Allocation using Virtual ClustersMark Stillwell
368 vues8 diapositives
Eventual Consistency – Du musst keine Angst haben par
Eventual Consistency – Du musst keine Angst habenEventual Consistency – Du musst keine Angst haben
Eventual Consistency – Du musst keine Angst habenSusanne Braun
163 vues41 diapositives
A Field-Based Versus A Protocol-Based Approach For Adaptive Task Assignment par
A Field-Based Versus A Protocol-Based Approach For Adaptive Task AssignmentA Field-Based Versus A Protocol-Based Approach For Adaptive Task Assignment
A Field-Based Versus A Protocol-Based Approach For Adaptive Task AssignmentYolanda Ivey
2 vues36 diapositives
Eventual Consistency - JUG DA par
Eventual Consistency - JUG DAEventual Consistency - JUG DA
Eventual Consistency - JUG DASusanne Braun
179 vues42 diapositives
Entropy: a Consolidation Manager for Clusters par
Entropy: a Consolidation Manager for ClustersEntropy: a Consolidation Manager for Clusters
Entropy: a Consolidation Manager for ClustersFabien Hermenier
696 vues34 diapositives

Similaire à Cluster-Wide Context Switch of Virtualized Jobs(20)

Resource Allocation using Virtual Clusters par Mark Stillwell
Resource Allocation using Virtual ClustersResource Allocation using Virtual Clusters
Resource Allocation using Virtual Clusters
Mark Stillwell368 vues
Eventual Consistency – Du musst keine Angst haben par Susanne Braun
Eventual Consistency – Du musst keine Angst habenEventual Consistency – Du musst keine Angst haben
Eventual Consistency – Du musst keine Angst haben
Susanne Braun163 vues
A Field-Based Versus A Protocol-Based Approach For Adaptive Task Assignment par Yolanda Ivey
A Field-Based Versus A Protocol-Based Approach For Adaptive Task AssignmentA Field-Based Versus A Protocol-Based Approach For Adaptive Task Assignment
A Field-Based Versus A Protocol-Based Approach For Adaptive Task Assignment
Yolanda Ivey2 vues
Eventual Consistency - JUG DA par Susanne Braun
Eventual Consistency - JUG DAEventual Consistency - JUG DA
Eventual Consistency - JUG DA
Susanne Braun179 vues
Entropy: a Consolidation Manager for Clusters par Fabien Hermenier
Entropy: a Consolidation Manager for ClustersEntropy: a Consolidation Manager for Clusters
Entropy: a Consolidation Manager for Clusters
Fabien Hermenier696 vues
PMAM_2014_Autonomic_Skeletons_using_Events-FinalVersion par Gustavo Pabon
PMAM_2014_Autonomic_Skeletons_using_Events-FinalVersionPMAM_2014_Autonomic_Skeletons_using_Events-FinalVersion
PMAM_2014_Autonomic_Skeletons_using_Events-FinalVersion
Gustavo Pabon230 vues
Toward Design, Modelling and Analysis of Dynamic Workflow Reconfiguration: a ... par Mazzara1976
Toward Design, Modelling and Analysis of Dynamic Workflow Reconfiguration: a ...Toward Design, Modelling and Analysis of Dynamic Workflow Reconfiguration: a ...
Toward Design, Modelling and Analysis of Dynamic Workflow Reconfiguration: a ...
Mazzara1976531 vues
Iaetsd improved load balancing model based on par Iaetsd Iaetsd
Iaetsd improved load balancing model based onIaetsd improved load balancing model based on
Iaetsd improved load balancing model based on
Iaetsd Iaetsd211 vues
A multiperiod set covering location model for dynamic redeployment of ambulances par Hari Rajagopalan
A multiperiod set covering location model for dynamic redeployment of ambulancesA multiperiod set covering location model for dynamic redeployment of ambulances
A multiperiod set covering location model for dynamic redeployment of ambulances
Hari Rajagopalan3.5K vues
A Novel Switch Mechanism for Load Balancing in Public Cloud par IJMER
A Novel Switch Mechanism for Load Balancing in Public CloudA Novel Switch Mechanism for Load Balancing in Public Cloud
A Novel Switch Mechanism for Load Balancing in Public Cloud
IJMER514 vues

Dernier

DSD-INT 2023 The Danube Hazardous Substances Model - Kovacs par
DSD-INT 2023 The Danube Hazardous Substances Model - KovacsDSD-INT 2023 The Danube Hazardous Substances Model - Kovacs
DSD-INT 2023 The Danube Hazardous Substances Model - KovacsDeltares
8 vues17 diapositives
SAP FOR TYRE INDUSTRY.pdf par
SAP FOR TYRE INDUSTRY.pdfSAP FOR TYRE INDUSTRY.pdf
SAP FOR TYRE INDUSTRY.pdfVirendra Rai, PMP
24 vues3 diapositives
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the... par
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...Deltares
6 vues22 diapositives
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI... par
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...Marc Müller
37 vues83 diapositives
Programming Field par
Programming FieldProgramming Field
Programming Fieldthehardtechnology
5 vues9 diapositives
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema par
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - GeertsemaDSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - GeertsemaDeltares
17 vues13 diapositives

Dernier(20)

DSD-INT 2023 The Danube Hazardous Substances Model - Kovacs par Deltares
DSD-INT 2023 The Danube Hazardous Substances Model - KovacsDSD-INT 2023 The Danube Hazardous Substances Model - Kovacs
DSD-INT 2023 The Danube Hazardous Substances Model - Kovacs
Deltares8 vues
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the... par Deltares
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...
Deltares6 vues
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI... par Marc Müller
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Marc Müller37 vues
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema par Deltares
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - GeertsemaDSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema
Deltares17 vues
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated... par TomHalpin9
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
TomHalpin95 vues
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko... par Deltares
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...
Deltares14 vues
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ... par Deltares
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...
Deltares10 vues
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra... par Marc Müller
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra....NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
Marc Müller38 vues
DSD-INT 2023 Salt intrusion Modelling of the Lauwersmeer, towards a measureme... par Deltares
DSD-INT 2023 Salt intrusion Modelling of the Lauwersmeer, towards a measureme...DSD-INT 2023 Salt intrusion Modelling of the Lauwersmeer, towards a measureme...
DSD-INT 2023 Salt intrusion Modelling of the Lauwersmeer, towards a measureme...
Deltares5 vues
DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -... par Deltares
DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -...DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -...
DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -...
Deltares6 vues
Copilot Prompting Toolkit_All Resources.pdf par Riccardo Zamana
Copilot Prompting Toolkit_All Resources.pdfCopilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdf
Navigating container technology for enhanced security by Niklas Saari par Metosin Oy
Navigating container technology for enhanced security by Niklas SaariNavigating container technology for enhanced security by Niklas Saari
Navigating container technology for enhanced security by Niklas Saari
Metosin Oy13 vues
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium... par Lisi Hocke
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Lisi Hocke28 vues
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx par animuscrm
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx
animuscrm14 vues
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with... par sparkfabrik
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
sparkfabrik5 vues

Cluster-Wide Context Switch of Virtualized Jobs

  • 1. Cluster-Wide Context Switch of Virtualized Jobs Fabien Hermenier, Adrien Lèbre, Jean-Marc Menaud VTDC’10, 22 June 2010 ASCOLA Team Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 1 / 23
  • 2. Agenda Motivation Global Design Architecture Implementation Proof of concept A sample scheduler Experiment on a cluster Conclusion Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 2 / 23
  • 3. Motivation Agenda Motivation Global Design Architecture Implementation Proof of concept A sample scheduler Experiment on a cluster Conclusion Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 3 / 23
  • 4. Motivation Motivation Clusters I large infrastructures to execute various jobs Resource Management System (RMS) I manage the execution of jobs I resources are allocated to jobs according to their description I scheduling: which jobs to execute, and where ? Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 4 / 23
  • 5. Motivation Jobs schedulers Usually A corse-grain exploitation of resources : I static allocation of resources I execution to completion Dynamic schedulers exist Based on mechanisms that manipulate the jobs dynamically (migration, preemption, dynamic allocation of resources, . . . ). BUT I mechanisms are complex to implement I mechanisms are complex to use efficiently Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 5 / 23
  • 6. Motivation Motivation Virtual Machines (VMs) as a backend for dynamic schedulers I each component is embedded into its VM I VMMs provide migration, preemption I still complex to use efficiently A cutting-edge building block dynamic consolidation, best-effort jobs , . . . I various policies, but common concepts to perform the changes I each provides an ad-hoc solution to handle several common issues: I dependencies between actions I correctness I reactivity Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 6 / 23
  • 7. Motivation Proposition Performing the changes should not be a primary concern for developers I a generic cluster-wide context switch based on VMs I developers only focus on the algorithm to select the jobs to run I the cluster-wide context switch takes care of the rest I detects the changes to perform I ensures the correctness of the transition I computes the fastest possible transition The implementation leverages the consolidation manager Entropy Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 7 / 23
  • 8. Global Design Agenda Motivation Global Design Architecture Implementation Proof of concept A sample scheduler Experiment on a cluster Conclusion Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 8 / 23
  • 9. Global Design Architecture From jobs to virtualized Jobs Figure: The life cycle of a vjob I a vjob encapsulates one or several VMs I to change the state of a vjob, actions (except migrate) are executed on each VMs Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 9 / 23
  • 10. Global Design Architecture Configuration I describes the assignment of the running VMs to working nodes I nodes provide CPU and memory resources I running VMs require CPU and memory resources to run at peak level (a) Non-viable configuration (b) Viable configuration Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 10 / 23
  • 11. Global Design Architecture The control loop of Entropy Monitor I extract the current configuration: VM position, CPU/memory consumption I adaptable to a specific monitoring system (currently Ganglia) Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 11 / 23
  • 12. Global Design Architecture The control loop of Entropy Scheduling policy I an algorithm to select the vjobs to run wrt. the current configuration I provided by a developer Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 11 / 23
  • 13. Global Design Architecture The control loop of Entropy The cluster-wide context switch module I selects a position for each VM to run I infers the actions that make the transition w. the current configuration I computes the fastest plan that ensure the correctness of the process Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 11 / 23
  • 14. Global Design Architecture The control loop of Entropy Execution I associate each action of the plan with a driver that performs the action I adaptable to specific environments. Currently support Xen VMM (XML-RPC) or shell command Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 11 / 23
  • 15. Global Design Implementation Role of the CW context switch I detects the actions to perform I selects a position for each VM to run I plans the actions to guarantee the correctness of the process I computes the fastest possible plan Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 12 / 23
  • 16. Global Design Implementation Plan the actions Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 13 / 23
  • 17. Global Design Implementation Plan the actions Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 13 / 23
  • 18. Global Design Implementation Plan the actions Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 13 / 23
  • 19. Global Design Implementation Plan the actions The reconfiguration plan I a protocol to execute actions I actions feasible in parallel are grouped into a same step I steps are executed sequentially Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 13 / 23
  • 20. Global Design Implementation Suspending/Resuming a vjob I inter-connected VMs should be continuously in the same state I coordination to ensure that distributed applications will not fail Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 14 / 23
  • 21. Global Design Implementation Suspending/Resuming a vjob I inter-connected VMs should be continuously in the same state I coordination to ensure that distributed applications will not fail I actions are grouped into a same step I synchronization between the pause/unpause actions Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 14 / 23
  • 22. Global Design Implementation Reducing the duration of a cluster-wide context switch 45 40 35 30 25 20 15 10 5 128 256 start/run migrate stop/shutdown 1024 2048 VM size (in MB) Completion time (in sec) 0 512 Completion time 200 150 100 50 0 128 256 (in sec) local nfs localïscp localïrsync 1024 2048 VM size (in MB) 512 I the duration of an action depends on its context I a function estimates the cost of a whole CW context switch Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 15 / 23
  • 23. Global Design Implementation Reducing the duration of a CW context switch An approach based on constraint programing Entropy computes a new configuration that I is viable I respects the scheduling policy I implies the minimal cost In practice I actions are performed asap. I prefer moving VMs will small memory requirements I avoid migrations and remote resumes Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 16 / 23
  • 24. Proof of concept Agenda Motivation Global Design Architecture Implementation Proof of concept A sample scheduler Experiment on a cluster Conclusion Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 17 / 23
  • 25. Proof of concept A sample scheduler A sample scheduler Principle I a FIFO queue I VMs are assigned to nodes using a First Fit Decrease heuristic I priority between jobs to prevent starvation Example Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 18 / 23
  • 26. Proof of concept A sample scheduler A sample scheduler Principle I a FIFO queue I VMs are assigned to nodes using a First Fit Decrease heuristic I priority between jobs to prevent starvation Example Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 18 / 23
  • 27. Proof of concept A sample scheduler A sample scheduler Principle I a FIFO queue I VMs are assigned to nodes using a First Fit Decrease heuristic I priority between jobs to prevent starvation Benefits using CW context switch I dynamic allocation of resources I preemption I migration of VMs Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 18 / 23
  • 28. Proof of concept Experiment on a cluster Environment Hardware I 11 working nodes I 3 storage nodes share VM images I 1 service node is running Entropy Protocol I a queue of 8 vjobs (NASGrid benchmarks) I each vjob uses 9 VMs I comparison with regards to FCFS I resources usage I completion time Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 19 / 23
  • 29. Proof of concept Experiment on a cluster Experiment on a cluster Benefits I improve resource usage I suspend/resume transparent for the developer Resources usage Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 20 / 23
  • 30. Proof of concept Experiment on a cluster Experiment on a cluster Benefits I improve resource usage I suspend/resume transparent for the developer I reduce the completion time Cumulated execution time I FCFS: 250 minutes I Entropy: 150 minutes Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 20 / 23
  • 31. Conclusion Agenda Motivation Global Design Architecture Implementation Proof of concept A sample scheduler Experiment on a cluster Conclusion Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 21 / 23
  • 32. Conclusion Conclusion RMSs start to manage VMs instead of process I VMMs provide mechanisms to implement dynamic schedulers I manipulate VMs is tedious and may be non cost-effective I various scheduling policies but common concepts to perform the context switch A generic cluster-wide context switch I make the implementation of dynamic schedulers easier I the context switch is outside the scheduling algorithm I an implementation in Entropy with a sample algorithm http://entropy.gforge.inria.fr version 1.2 (LGPL) Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 22 / 23
  • 33. Conclusion I’m looking for a postdoc position I fond of - virtualization, distributed systems, autonomic computing, . . . I dislike - tomatoes Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 23 / 23