SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
Autonomic SLA-driven
Provisioning for Cloud
Applications

 Nicolas Bonvin, Thanasis Papaioannou, Karl Aberer

 CCGRID 2011, May 23-26 2011, New Port Beach, CA, USA

 nicolas.bonvin@epfl.ch
 LSIR - EPFL
Cloud Apps – Issue #1 : Placement

    ●    A distributed, component-based application running on an elastic
         infrastructure




                       C1
                       C1          C2
                                   C2          C3
                                               C3             C4
                                                              C4




2   EPFL – LSIR - Nicolas Bonvin
Cloud Apps – Issue #1 : Placement

    ●    A distributed, component-based application running on an elastic
         infrastructure




                       C1
                       C1                C2
                                         C2    C3
                                               C3             C4
                                                              C4


                                   VM1        VM2             VM3




3   EPFL – LSIR - Nicolas Bonvin
Cloud Apps – Issue #1 : Placement

    ●    A distributed, component-based application running on an elastic
         infrastructure
    ●    Performance of C1, C2 and C3 is probably less than C4
    ●    No info on other VMs colocated on same server !



                       C1
                       C1                  C2
                                           C2       C3
                                                    C3         C4
                                                               C4


                                   VM1              VM2       VM3


                                         Server 1            Server 2




4   EPFL – LSIR - Nicolas Bonvin
Cloud Apps – Issue #1 : Placement

    ●    A distributed, component-based application running on an elastic
         infrastructure
    ●    Performance of C1, C2 and C3 is probably less than C4
    ●    No info on other VMs colocated on same server !



                       C1
                       C1                  C2
                                           C2            C3
                                                         C3           C4
                                                                      C4


                                   VM1                   VM2         VM3


                                         Server 1                   Server 2




                                          No control on placement


5   EPFL – LSIR - Nicolas Bonvin
Cloud Apps – Issue #2 : Unstability

    ●    Load-balanced trafic to 4 identical components on 4 identical VMs




                       C1
                       C1           C1
                                    C1         C1
                                               C1             C1
                                                              C1


                      VM1           VM2       VM3             VM4



                   100 ms          100 ms    100 ms         100 ms




6   EPFL – LSIR - Nicolas Bonvin
Cloud Apps – Issue #2 : Unstability

    ●    Load-balanced trafic to 4 identical components on 4 identical VMs
                     –    VM performance can vary up to a ratio 4 ! [Dej2009]
                                   ●   Physical server, Hypervisor, Storage, ...




                         C1
                         C1                  C1
                                             C1              C1
                                                             C1              C1
                                                                             C1


                         VM1                VM2             VM3              VM4



                   100 ms                 140 ms           100 ms          100 ms




7   EPFL – LSIR - Nicolas Bonvin
Cloud Apps – Issue #2 : Unstability

    ●    Load-balanced trafic to 4 identical components on 4 identical VMs
                     –    VM performance can vary up to a ratio 4 ! [Dej2009]
                                   ●   Physical server, Hypervisor, Storage, ...
                                   ●   Component overloaded




                         C1
                         C1                  C1
                                             C1              C1
                                                             C1              C1
                                                                             C1


                         VM1                VM2             VM3              VM4



                   130 ms                 140 ms           100 ms          100 ms




8   EPFL – LSIR - Nicolas Bonvin
Cloud Apps – Issue #2 : Unstability

    ●    Load-balanced trafic to 4 identical components on 4 identical VMs
                     –    VM performance can vary up to a ratio 4 ! [Dej2009]
                                   ●   Physical server, Hypervisor, Storage, ...
                                   ●   Component overloaded
                                   ●   Component bug, crash, deadlock, ...



                         C1
                         C1                  C1
                                             C1              C1
                                                             C1              C1
                                                                             C1


                         VM1                VM2             VM3              VM4



                   130 ms                 140 ms           100 ms          infinity




9   EPFL – LSIR - Nicolas Bonvin
Cloud Apps – Issue #2 : Unstability

     ●    Load-balanced trafic to 4 identical components on 4 identical VMs
                      –    VM performance can vary up to a ratio 4 ! [Dej2009]
                                    ●   Physical server, Hypervisor, Storage, ...
                                    ●   Component overloaded
                                    ●   Component bug, crash, deadlock, ...
                                    ●   Failure of C1 on VM4 -> load is rebalanced


                          C1
                          C1                  C1
                                              C1              C1
                                                              C1              C1
                                                                              C1


                          VM1                VM2             VM3              VM4



                    140 ms                 150 ms           130 ms          infinity




10   EPFL – LSIR - Nicolas Bonvin
Cloud Apps – Issue #2 : Unstability

     ●    Load-balanced trafic to 4 identical components on 4 identical VMs
                      –    VM performance can vary up to a ratio 4 ! [Dej2009]
                                    ●   Physical server, Hypervisor, Storage, ...
                                    ●   Component overloaded
                                    ●   Component bug, crash, deadlock, ...
                                    ●   Failure of C1 on VM4 -> load is rebalanced


                          C1
                          C1                  C1
                                              C1              C1
                                                              C1               C1
                                                                               C1


                          VM1                VM2             VM3              VM4



                    140 ms                 150 ms           130 ms           infinity

                                          Application should react early !

11   EPFL – LSIR - Nicolas Bonvin
Cloud Apps – Overview

     ●    Build for failures
                      –   Do not trust the underlying infrastructure
                      –   Do not trust your components either !
     ●    Components should adapt to the changing conditions
                      –   Quickly
                      –   Automatically
                      –   e.g. by replacing a wonky VM by a new one




12   EPFL – LSIR - Nicolas Bonvin
Scarce:
a framework to build scalable cloud applications
Architecture Overview

     ●    An agent on each server / VM
                      –    starts/stops/monitors the components
                      –    Takes decisions on behalf of the components
     ●    An agent communicates with other agents
                      –    Routing table
                      –    Status of the server (resources usage)


                          Server                        Agent
                                                                               Agent
                 A

                 B              Agent                            GOSSIPING
                                                                + BROADCAST
                                                    Agent
                                                                                Agent
                 E


                                                                       Agent


14   EPFL – LSIR - Nicolas Bonvin
An economic approach

     ●    Time is split into epochs (no synchronization between servers)
     ●    Servers charge a virtual rent for hosting a component according to
                      –   Current resource usage (I/O, CPU, ...) of the server
                      –   Technical factors (HW, connectivity, ...)
                      –   Non-technical factors (country stability, ....)




15   EPFL – LSIR - Nicolas Bonvin
An economic approach

     ●    Time is split into epochs (no synchronization between servers)
     ●    Servers charge a virtual rent for hosting a component according to
                      –   Current resource usage (I/O, CPU, ...) of the server
                      –   Technical factors (HW, connectivity, ...)
                      –   Non-technical factors (country stability, ....)


     ●    Components
                      –   Pay virtual rent at each epoch
                      –   Gain virtual money by processing requests
                      –   Take decisions based on balance ( = gain – rent )
                                    ●   Replicate, migrate, suicide, stay

     ●    Virtual rents are updated by gossiping (no centralized board)

16   EPFL – LSIR - Nicolas Bonvin
Economic model (i)




     ●    The rent of a server is different for each component !


17   EPFL – LSIR - Nicolas Bonvin
Economic model (ii)

                                                                      CPU : 70%
                                                                      I/O : 20%
                                                             VM1
         CPU : 30%
         I/O : 5%
                                    C1
                                    C1           ?
                                                                      CPU : 25%
                                                                      I/O : 65%
                                                             VM2


     ●    VM1 and VM2 have an « identical » resources usage : 45%
     ●    Server rent = server's resources usage with component's weights
                      –   Rent for C1 @ VM1 > rent for C1 @ VM2


                                         Multiplexing of server resources

18   EPFL – LSIR - Nicolas Bonvin
Economic model (iii)

     ●    Choosing a candidate server j during replication/migration of a
          component i
                      –   netbenefit maximization




     ●    2 optimization goals :
                      –   high-availability by geographical diversity of replicas
                      –   low latency by grouping related components
     ●    gj : weight related to the proximity of the server location to the
          geographical distribution of the client requests to the component
     ●    Si is the set of server hosting a replica of component i


19   EPFL – LSIR - Nicolas Bonvin
SLA Performance Guarantees (i)

     ●    Each component has its own SLA constraints
     ●    SLA derived directly from entry components


                                                   C2
                                                   C2   C4
                                                        C4

                                        C1
                                         C1
                                    SLA :: 500ms
                                    SLA 500ms

                                                   C3
                                                   C3   C5
                                                        C5




     ●    Resp. Time = Service Time + max (Resp. Time of Dependencies)




20   EPFL – LSIR - Nicolas Bonvin
SLA Performance Guarantees (ii)

     ●    SLA propagation from parents to children
     ●    Parent j sends its performance constraints (e.g. response time upper
          bound) to its dependencies D(j) :




     ●    Child i computes its own performance constraints :




     ●         : group of constraints sent by the replicas of the parent g




21   EPFL – LSIR - Nicolas Bonvin
SLA Performance Guarantees (iii)

     ●    SLA propagation from parents to children




22   EPFL – LSIR - Nicolas Bonvin
Automatic Provisioning

     ●    Usage of allocated resources is maximized :
                      –   autonomic migration / replication / suicide of components
                      –   not enough to ensure end-to-end response time


     ●    Cloud resources managed by framework via cloud API

     ●    Each individual component has to satisfy its own SLA
                      –   SLA easily met -> decrease resources (scale down)
                      –   SLA not met -> increase resources (scale up, scale out)




23   EPFL – LSIR - Nicolas Bonvin
Adaptivity to slow servers

     ●    Each component keeps statistics about its children
                      –   e.g. 95th perc. response time
     ●    A routing coefficient is computed for each child at each epoch
                      –   Send more requests to more performant children




24   EPFL – LSIR - Nicolas Bonvin
Evaluation
Evaluation: Setup

     ●    5 components, mostly CPU-intensive (wc >> wm,wn,wd)



                                                   C2
                                                   C2   C4
                                                        C4

                                        C1
                                         C1
                                    SLA :: 500ms
                                    SLA 500ms

                                                   C3
                                                   C3   C5
                                                        C5




     ●    8 8-cores servers (Intel Core i7 920, 2.67 GHz, 8GB, Linux 2.6.32-
          trunk-amd64)
     ●    d=0, C=110, k =10000, xs* = 25%




26   EPFL – LSIR - Nicolas Bonvin
Adaptation to Varying Load (i)

     ●    5 rps to 60 rps at minute 8, step 5 rps/min
     ●    Static setup : 2 servers with 2 cores




27   EPFL – LSIR - Nicolas Bonvin
Adaptation to Varying Load (ii)

     ●    5 rps to 60 rps at minute 8, step 5 rps/min
     ●    Static setup : 2 servers with 2 cores




28   EPFL – LSIR - Nicolas Bonvin
Adaptation to Slow Server

     ●    Max 2 cores/server, 25 rps
     ●    At minute 4, a server gets slower (200 ms delay)




29   EPFL – LSIR - Nicolas Bonvin
Scalability

     ●    Add 5 rps
            per minute until 150 rps
     ●    Max 6 cores/server




30   EPFL – LSIR - Nicolas Bonvin
Conclusion
Conclusion

     ●    Framework for building cloud applications
     ●    Elasticity : add/remove resources
     ●    High Availability : software, hardware, network failures
     ●    Scalability : growing load, peaks, scaling down, ...
                      –   Quick replication of busy components
     ●    Load Balancing : load has to be shared by all available servers
                      –   Replication of busy components
                      –   Migration of less busy components
                      –   Reach equilibrium when load is stable
     ●    SLA performance guarantees
                      –   Automatic provisioning
     ●    No synchronization, fully decentralized



32   EPFL – LSIR - Nicolas Bonvin
Thank you !

Contenu connexe

En vedette

Data SLA in the public cloud
Data SLA in the public cloudData SLA in the public cloud
Data SLA in the public cloud
Liran Zelkha
 
Aims2011 slacc-presentation final-version
Aims2011 slacc-presentation final-versionAims2011 slacc-presentation final-version
Aims2011 slacc-presentation final-version
ictseserv
 
Forecast 2014 Keynote: State of Cloud Migration…What's Occurring Now, and Wha...
Forecast 2014 Keynote: State of Cloud Migration…What's Occurring Now, and Wha...Forecast 2014 Keynote: State of Cloud Migration…What's Occurring Now, and Wha...
Forecast 2014 Keynote: State of Cloud Migration…What's Occurring Now, and Wha...
Open Data Center Alliance
 
Assess enterprise applications for cloud migration
Assess enterprise applications for cloud migrationAssess enterprise applications for cloud migration
Assess enterprise applications for cloud migration
nanda1505
 
How we measure quality of JIRA deployments to Cloud?
How we measure quality of JIRA deployments to Cloud?How we measure quality of JIRA deployments to Cloud?
How we measure quality of JIRA deployments to Cloud?
Stowarzyszenie Jakości Systemów Informatycznych (SJSI)
 
SLAs in Virtualized Cloud Computing Infrastructures with QoS Assurance
SLAs in Virtualized Cloud Computing Infrastructures with QoS AssuranceSLAs in Virtualized Cloud Computing Infrastructures with QoS Assurance
SLAs in Virtualized Cloud Computing Infrastructures with QoS Assurance
tcucinotta
 

En vedette (17)

Innovation with Open Source: The New South Wales Judicial Commission experience
Innovation with Open Source: The New South Wales Judicial Commission experienceInnovation with Open Source: The New South Wales Judicial Commission experience
Innovation with Open Source: The New South Wales Judicial Commission experience
 
Data SLA in the public cloud
Data SLA in the public cloudData SLA in the public cloud
Data SLA in the public cloud
 
Aims2011 slacc-presentation final-version
Aims2011 slacc-presentation final-versionAims2011 slacc-presentation final-version
Aims2011 slacc-presentation final-version
 
reliability based design optimization for cloud migration
reliability based design optimization for cloud migrationreliability based design optimization for cloud migration
reliability based design optimization for cloud migration
 
5 Cloud Migration Experiences Not to Be Repeated
5 Cloud Migration Experiences Not to Be Repeated5 Cloud Migration Experiences Not to Be Repeated
5 Cloud Migration Experiences Not to Be Repeated
 
Massimiliano Raks, Naples University on SPECS: Secure provisioning of cloud s...
Massimiliano Raks, Naples University on SPECS: Secure provisioning of cloud s...Massimiliano Raks, Naples University on SPECS: Secure provisioning of cloud s...
Massimiliano Raks, Naples University on SPECS: Secure provisioning of cloud s...
 
Cloud migration pattern using microservices
Cloud migration pattern using microservicesCloud migration pattern using microservices
Cloud migration pattern using microservices
 
Tracking SLAs In Cloud
Tracking SLAs In CloudTracking SLAs In Cloud
Tracking SLAs In Cloud
 
Forecast 2014 Keynote: State of Cloud Migration…What's Occurring Now, and Wha...
Forecast 2014 Keynote: State of Cloud Migration…What's Occurring Now, and Wha...Forecast 2014 Keynote: State of Cloud Migration…What's Occurring Now, and Wha...
Forecast 2014 Keynote: State of Cloud Migration…What's Occurring Now, and Wha...
 
Assess enterprise applications for cloud migration
Assess enterprise applications for cloud migrationAssess enterprise applications for cloud migration
Assess enterprise applications for cloud migration
 
Enforcing Application SLA with Congress and Monasca
Enforcing Application SLA with Congress and MonascaEnforcing Application SLA with Congress and Monasca
Enforcing Application SLA with Congress and Monasca
 
How we measure quality of JIRA deployments to Cloud?
How we measure quality of JIRA deployments to Cloud?How we measure quality of JIRA deployments to Cloud?
How we measure quality of JIRA deployments to Cloud?
 
Cloud computing final
Cloud computing finalCloud computing final
Cloud computing final
 
Planning for a (Mostly) Hassle-Free Cloud Migration | VTUG 2016 Winter Warmer
Planning for a (Mostly) Hassle-Free Cloud Migration | VTUG 2016 Winter WarmerPlanning for a (Mostly) Hassle-Free Cloud Migration | VTUG 2016 Winter Warmer
Planning for a (Mostly) Hassle-Free Cloud Migration | VTUG 2016 Winter Warmer
 
SLAs in Virtualized Cloud Computing Infrastructures with QoS Assurance
SLAs in Virtualized Cloud Computing Infrastructures with QoS AssuranceSLAs in Virtualized Cloud Computing Infrastructures with QoS Assurance
SLAs in Virtualized Cloud Computing Infrastructures with QoS Assurance
 
Outsourcing SLA versus Cloud SLA by Jurian Burgers
Outsourcing SLA versus Cloud SLA by Jurian BurgersOutsourcing SLA versus Cloud SLA by Jurian Burgers
Outsourcing SLA versus Cloud SLA by Jurian Burgers
 
Measureable Cloud Migration
Measureable Cloud MigrationMeasureable Cloud Migration
Measureable Cloud Migration
 

Similaire à Autonomic SLA-driven Provisioning for Cloud Applications

Building LinkedIn's Next Generation Architecture with OSGi
Building LinkedIn's Next Generation  Architecture with OSGiBuilding LinkedIn's Next Generation  Architecture with OSGi
Building LinkedIn's Next Generation Architecture with OSGi
LinkedIn
 
Windows server 8 hyper v networking (aidan finn)
Windows server 8 hyper v networking (aidan finn)Windows server 8 hyper v networking (aidan finn)
Windows server 8 hyper v networking (aidan finn)
hypervnu
 
eBay Architecture
eBay Architecture eBay Architecture
eBay Architecture
Tony Ng
 
Ugif 04 2011 ibm informix genero offering v12
Ugif 04 2011   ibm informix genero offering v12Ugif 04 2011   ibm informix genero offering v12
Ugif 04 2011 ibm informix genero offering v12
UGIF
 

Similaire à Autonomic SLA-driven Provisioning for Cloud Applications (20)

Building LinkedIn's Next Generation Architecture with OSGi
Building LinkedIn's Next Generation  Architecture with OSGiBuilding LinkedIn's Next Generation  Architecture with OSGi
Building LinkedIn's Next Generation Architecture with OSGi
 
Windows server 8 hyper v networking (aidan finn)
Windows server 8 hyper v networking (aidan finn)Windows server 8 hyper v networking (aidan finn)
Windows server 8 hyper v networking (aidan finn)
 
eBay Architecture
eBay Architecture eBay Architecture
eBay Architecture
 
Windows Server 8 Hyper V Networking
Windows Server 8 Hyper V NetworkingWindows Server 8 Hyper V Networking
Windows Server 8 Hyper V Networking
 
Tungsten University: Geographically Distributed Multi-Master MySQL Clusters
Tungsten University: Geographically Distributed Multi-Master MySQL ClustersTungsten University: Geographically Distributed Multi-Master MySQL Clusters
Tungsten University: Geographically Distributed Multi-Master MySQL Clusters
 
Much Ado about CPU
Much Ado about CPUMuch Ado about CPU
Much Ado about CPU
 
Much Ado About CPU
Much Ado About CPUMuch Ado About CPU
Much Ado About CPU
 
An economic approach for scalable and highly-available distributed applications
An economic approach for scalable and highly-available distributed applicationsAn economic approach for scalable and highly-available distributed applications
An economic approach for scalable and highly-available distributed applications
 
Esp 100107093030-phpapp02
Esp 100107093030-phpapp02Esp 100107093030-phpapp02
Esp 100107093030-phpapp02
 
Scaling the Container Dataplane
Scaling the Container Dataplane Scaling the Container Dataplane
Scaling the Container Dataplane
 
Identity Summit UK: KEEP TALKING: LESSONS LEARNED DURING OUR MIGRATION FROM L...
Identity Summit UK: KEEP TALKING: LESSONS LEARNED DURING OUR MIGRATION FROM L...Identity Summit UK: KEEP TALKING: LESSONS LEARNED DURING OUR MIGRATION FROM L...
Identity Summit UK: KEEP TALKING: LESSONS LEARNED DURING OUR MIGRATION FROM L...
 
Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...
Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...
Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...
 
OSCON14: Mirage 2.0
OSCON14: Mirage 2.0 OSCON14: Mirage 2.0
OSCON14: Mirage 2.0
 
A Guide to Data Versioning with MapR Snapshots
A Guide to Data Versioning with MapR SnapshotsA Guide to Data Versioning with MapR Snapshots
A Guide to Data Versioning with MapR Snapshots
 
z/VM Platform Update
z/VM Platform Updatez/VM Platform Update
z/VM Platform Update
 
XS Boston 2008 Fault Tolerance
XS Boston 2008 Fault ToleranceXS Boston 2008 Fault Tolerance
XS Boston 2008 Fault Tolerance
 
Ugif 04 2011 ibm informix genero offering v12
Ugif 04 2011   ibm informix genero offering v12Ugif 04 2011   ibm informix genero offering v12
Ugif 04 2011 ibm informix genero offering v12
 
Securing Enterprise Assets In The Cloud
Securing Enterprise Assets In The CloudSecuring Enterprise Assets In The Cloud
Securing Enterprise Assets In The Cloud
 
Introduction to the Linux on System z Terminal Server using z/VM IUCV
Introduction to the Linux on System z Terminal Server using z/VM IUCVIntroduction to the Linux on System z Terminal Server using z/VM IUCV
Introduction to the Linux on System z Terminal Server using z/VM IUCV
 
The NRB Group mainframe day 2021 - IBM Z-Strategy & Roadmap - Adam John Sturg...
The NRB Group mainframe day 2021 - IBM Z-Strategy & Roadmap - Adam John Sturg...The NRB Group mainframe day 2021 - IBM Z-Strategy & Roadmap - Adam John Sturg...
The NRB Group mainframe day 2021 - IBM Z-Strategy & Roadmap - Adam John Sturg...
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Dernier (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Autonomic SLA-driven Provisioning for Cloud Applications

  • 1. Autonomic SLA-driven Provisioning for Cloud Applications Nicolas Bonvin, Thanasis Papaioannou, Karl Aberer CCGRID 2011, May 23-26 2011, New Port Beach, CA, USA nicolas.bonvin@epfl.ch LSIR - EPFL
  • 2. Cloud Apps – Issue #1 : Placement ● A distributed, component-based application running on an elastic infrastructure C1 C1 C2 C2 C3 C3 C4 C4 2 EPFL – LSIR - Nicolas Bonvin
  • 3. Cloud Apps – Issue #1 : Placement ● A distributed, component-based application running on an elastic infrastructure C1 C1 C2 C2 C3 C3 C4 C4 VM1 VM2 VM3 3 EPFL – LSIR - Nicolas Bonvin
  • 4. Cloud Apps – Issue #1 : Placement ● A distributed, component-based application running on an elastic infrastructure ● Performance of C1, C2 and C3 is probably less than C4 ● No info on other VMs colocated on same server ! C1 C1 C2 C2 C3 C3 C4 C4 VM1 VM2 VM3 Server 1 Server 2 4 EPFL – LSIR - Nicolas Bonvin
  • 5. Cloud Apps – Issue #1 : Placement ● A distributed, component-based application running on an elastic infrastructure ● Performance of C1, C2 and C3 is probably less than C4 ● No info on other VMs colocated on same server ! C1 C1 C2 C2 C3 C3 C4 C4 VM1 VM2 VM3 Server 1 Server 2 No control on placement 5 EPFL – LSIR - Nicolas Bonvin
  • 6. Cloud Apps – Issue #2 : Unstability ● Load-balanced trafic to 4 identical components on 4 identical VMs C1 C1 C1 C1 C1 C1 C1 C1 VM1 VM2 VM3 VM4 100 ms 100 ms 100 ms 100 ms 6 EPFL – LSIR - Nicolas Bonvin
  • 7. Cloud Apps – Issue #2 : Unstability ● Load-balanced trafic to 4 identical components on 4 identical VMs – VM performance can vary up to a ratio 4 ! [Dej2009] ● Physical server, Hypervisor, Storage, ... C1 C1 C1 C1 C1 C1 C1 C1 VM1 VM2 VM3 VM4 100 ms 140 ms 100 ms 100 ms 7 EPFL – LSIR - Nicolas Bonvin
  • 8. Cloud Apps – Issue #2 : Unstability ● Load-balanced trafic to 4 identical components on 4 identical VMs – VM performance can vary up to a ratio 4 ! [Dej2009] ● Physical server, Hypervisor, Storage, ... ● Component overloaded C1 C1 C1 C1 C1 C1 C1 C1 VM1 VM2 VM3 VM4 130 ms 140 ms 100 ms 100 ms 8 EPFL – LSIR - Nicolas Bonvin
  • 9. Cloud Apps – Issue #2 : Unstability ● Load-balanced trafic to 4 identical components on 4 identical VMs – VM performance can vary up to a ratio 4 ! [Dej2009] ● Physical server, Hypervisor, Storage, ... ● Component overloaded ● Component bug, crash, deadlock, ... C1 C1 C1 C1 C1 C1 C1 C1 VM1 VM2 VM3 VM4 130 ms 140 ms 100 ms infinity 9 EPFL – LSIR - Nicolas Bonvin
  • 10. Cloud Apps – Issue #2 : Unstability ● Load-balanced trafic to 4 identical components on 4 identical VMs – VM performance can vary up to a ratio 4 ! [Dej2009] ● Physical server, Hypervisor, Storage, ... ● Component overloaded ● Component bug, crash, deadlock, ... ● Failure of C1 on VM4 -> load is rebalanced C1 C1 C1 C1 C1 C1 C1 C1 VM1 VM2 VM3 VM4 140 ms 150 ms 130 ms infinity 10 EPFL – LSIR - Nicolas Bonvin
  • 11. Cloud Apps – Issue #2 : Unstability ● Load-balanced trafic to 4 identical components on 4 identical VMs – VM performance can vary up to a ratio 4 ! [Dej2009] ● Physical server, Hypervisor, Storage, ... ● Component overloaded ● Component bug, crash, deadlock, ... ● Failure of C1 on VM4 -> load is rebalanced C1 C1 C1 C1 C1 C1 C1 C1 VM1 VM2 VM3 VM4 140 ms 150 ms 130 ms infinity Application should react early ! 11 EPFL – LSIR - Nicolas Bonvin
  • 12. Cloud Apps – Overview ● Build for failures – Do not trust the underlying infrastructure – Do not trust your components either ! ● Components should adapt to the changing conditions – Quickly – Automatically – e.g. by replacing a wonky VM by a new one 12 EPFL – LSIR - Nicolas Bonvin
  • 13. Scarce: a framework to build scalable cloud applications
  • 14. Architecture Overview ● An agent on each server / VM – starts/stops/monitors the components – Takes decisions on behalf of the components ● An agent communicates with other agents – Routing table – Status of the server (resources usage) Server Agent Agent A B Agent GOSSIPING + BROADCAST Agent Agent E Agent 14 EPFL – LSIR - Nicolas Bonvin
  • 15. An economic approach ● Time is split into epochs (no synchronization between servers) ● Servers charge a virtual rent for hosting a component according to – Current resource usage (I/O, CPU, ...) of the server – Technical factors (HW, connectivity, ...) – Non-technical factors (country stability, ....) 15 EPFL – LSIR - Nicolas Bonvin
  • 16. An economic approach ● Time is split into epochs (no synchronization between servers) ● Servers charge a virtual rent for hosting a component according to – Current resource usage (I/O, CPU, ...) of the server – Technical factors (HW, connectivity, ...) – Non-technical factors (country stability, ....) ● Components – Pay virtual rent at each epoch – Gain virtual money by processing requests – Take decisions based on balance ( = gain – rent ) ● Replicate, migrate, suicide, stay ● Virtual rents are updated by gossiping (no centralized board) 16 EPFL – LSIR - Nicolas Bonvin
  • 17. Economic model (i) ● The rent of a server is different for each component ! 17 EPFL – LSIR - Nicolas Bonvin
  • 18. Economic model (ii) CPU : 70% I/O : 20% VM1 CPU : 30% I/O : 5% C1 C1 ? CPU : 25% I/O : 65% VM2 ● VM1 and VM2 have an « identical » resources usage : 45% ● Server rent = server's resources usage with component's weights – Rent for C1 @ VM1 > rent for C1 @ VM2 Multiplexing of server resources 18 EPFL – LSIR - Nicolas Bonvin
  • 19. Economic model (iii) ● Choosing a candidate server j during replication/migration of a component i – netbenefit maximization ● 2 optimization goals : – high-availability by geographical diversity of replicas – low latency by grouping related components ● gj : weight related to the proximity of the server location to the geographical distribution of the client requests to the component ● Si is the set of server hosting a replica of component i 19 EPFL – LSIR - Nicolas Bonvin
  • 20. SLA Performance Guarantees (i) ● Each component has its own SLA constraints ● SLA derived directly from entry components C2 C2 C4 C4 C1 C1 SLA :: 500ms SLA 500ms C3 C3 C5 C5 ● Resp. Time = Service Time + max (Resp. Time of Dependencies) 20 EPFL – LSIR - Nicolas Bonvin
  • 21. SLA Performance Guarantees (ii) ● SLA propagation from parents to children ● Parent j sends its performance constraints (e.g. response time upper bound) to its dependencies D(j) : ● Child i computes its own performance constraints : ● : group of constraints sent by the replicas of the parent g 21 EPFL – LSIR - Nicolas Bonvin
  • 22. SLA Performance Guarantees (iii) ● SLA propagation from parents to children 22 EPFL – LSIR - Nicolas Bonvin
  • 23. Automatic Provisioning ● Usage of allocated resources is maximized : – autonomic migration / replication / suicide of components – not enough to ensure end-to-end response time ● Cloud resources managed by framework via cloud API ● Each individual component has to satisfy its own SLA – SLA easily met -> decrease resources (scale down) – SLA not met -> increase resources (scale up, scale out) 23 EPFL – LSIR - Nicolas Bonvin
  • 24. Adaptivity to slow servers ● Each component keeps statistics about its children – e.g. 95th perc. response time ● A routing coefficient is computed for each child at each epoch – Send more requests to more performant children 24 EPFL – LSIR - Nicolas Bonvin
  • 26. Evaluation: Setup ● 5 components, mostly CPU-intensive (wc >> wm,wn,wd) C2 C2 C4 C4 C1 C1 SLA :: 500ms SLA 500ms C3 C3 C5 C5 ● 8 8-cores servers (Intel Core i7 920, 2.67 GHz, 8GB, Linux 2.6.32- trunk-amd64) ● d=0, C=110, k =10000, xs* = 25% 26 EPFL – LSIR - Nicolas Bonvin
  • 27. Adaptation to Varying Load (i) ● 5 rps to 60 rps at minute 8, step 5 rps/min ● Static setup : 2 servers with 2 cores 27 EPFL – LSIR - Nicolas Bonvin
  • 28. Adaptation to Varying Load (ii) ● 5 rps to 60 rps at minute 8, step 5 rps/min ● Static setup : 2 servers with 2 cores 28 EPFL – LSIR - Nicolas Bonvin
  • 29. Adaptation to Slow Server ● Max 2 cores/server, 25 rps ● At minute 4, a server gets slower (200 ms delay) 29 EPFL – LSIR - Nicolas Bonvin
  • 30. Scalability ● Add 5 rps per minute until 150 rps ● Max 6 cores/server 30 EPFL – LSIR - Nicolas Bonvin
  • 32. Conclusion ● Framework for building cloud applications ● Elasticity : add/remove resources ● High Availability : software, hardware, network failures ● Scalability : growing load, peaks, scaling down, ... – Quick replication of busy components ● Load Balancing : load has to be shared by all available servers – Replication of busy components – Migration of less busy components – Reach equilibrium when load is stable ● SLA performance guarantees – Automatic provisioning ● No synchronization, fully decentralized 32 EPFL – LSIR - Nicolas Bonvin