SLAs and Performance in the Cloud: Because There is More Than "Just" Availability

•

2 j'aime•1,289 vues

An SLA is only useful if it guarantees a certain level of quality. Current Cloud SLAs cover availability but ignore a key ingredient: Response and Throughput Performance. A Performance SLA would need to relate to the applications performance itself, something that no Cloud Provider has control over. We will discuss how Application Performance Monitoring can be used to define, measure and enforce a usable SLA for both sides. We will talk about the differences between IaaS and PaaS cloud providers concerning such an SLA. We will also show how this will lead to better User Experience with less R&D effort. Finally it enables us to easily compare cloud performance across vendors in terms that really matter: Response Time per Cost.

Technologie Design

SLAs and Performance in the Cloud:
Because There is More Than “Just”
Availability
[February, 20, 2012]

Everybody wants a Cloud SLA

• Security
• Availability
• Performance

Amazon did not violate its SLA
2

Current State of Cloud SLAs

• GoGrid
– 100% Server Uptime, Credit is given if violated
– reasonable efforts to insure that server storage is "persistent”
– Network Latency SLA, Credit for prepared fees
– Load Balancer: Uptime, latency and throughput
• Rackspace
– 100% Network excluding scheduled maintenance
– Server outage repaired within the hour
• Amazon:
– 99.95% Annual Regional Availability
• more than 1 Zone in the same Region unavailable
• Instances have no outside connectivity for at least 5 minutes
• API is not available to start new instances

3

No Capacity Guarantees 1000
800
600 Response
400 Time
200 Throughput
0

09:15
09:18
09:12
09:09
09:00
09:03
09:06
800
600
400
Response Steal Time!
Time
200
Throughput
Shared Resources!
0
09:00
09:03
09:06
09:09
09:12
09:15
09:18

Priorities have changed!

• I don’t care about the underlying Hardware
• Focus is on Business Value  My own Application
• Performance Management reflects that

Performance SLA must impact Application

5

Meaningful SLAs

• Application Performance
– End-to-End Response Time
– Throughput
• Application Availability
– Reachable by the End Users

Performance SLA is Application specific

Cloud SLA cannot cover that directly

6

Possible Cloud Performance SLAs

• IaaS
– Guaranteed Capacity (CPU, Memory, Bandwidth…)
– Guaranteed Latencies (Network, Load balancer, Disk…)
– Meaning and Enforcement outside app context?
• PaaS
– Guaranteed on Application Interfaces
– Meaning and Enforcement outside app context?

7

Side Effect of missing Performance SLAs

No viable way to compare Price/Performance
between multiple providers

8

What we care about

But
slow is
bad

Faster is
not
better

10

End-to-End Response Time Performance

User Click

On the Web Server

In the Application
In the Cloud

Application Response Time

Cloud DB
Latency Performance

12

Cloud Performance SLA

• Response Time SLA is Application based
• Latencies can be measured in the Application
• Latency SLA impacts Application and is enforceable

13

AWS Elastic Map/Reduce Performance

http://blog.dynatrace.com/2012/01/25/about-the-performance-of-map-reduce-jobs/
15

Cloud Performance SLA

• CPU Usage can be measured in the Application
(Attention: this is not utilization!)
• Capacity SLA is measurable and enforceable

16

Putting Cloud Monitoring in Context
Steal Time or
out of CPU?

Cause for
Latency

Benefits of APM for Cloud Application?

• Identify Performance Problems End-to-End!
• Determine Cloud vs. Application Issue
• Enforce Cloud Performance SLA
• Enforce Third-Party SLAs

Optimization can reduce the number of instances
Reduces Cost!

19

Side Effect: A Price Performance Index

• Dollar Value for acceptable Performance:
90th response time/(Total Cost/Number of Transactions)
Desired Throughput/Total Cost
– Mind Volatility
– Price Performance Index is comparable
• Cost Scalability
– Cost per Transaction must remain stable

Performance is no longer defined by Capacity
It is a function of desired User
Experience and associated Cost

Questions

THANK YOU
Michael Kopp
Michael.kopp@dynaTrace.com
http://blog.dynatrace.com
@mikopp

Recommandé

Application SLA - the missing part of complete SLA managementComarch

Machine Learning for automated diagnosis of distributed ...AEbutest

Enforcing Application SLA with Congress and MonascaFabio Giannetti

“Tools” and Standards for Cloud-SLASLA-Ready Network

Self-Adaptive SLA-Driven Capacity Management for Internet ServicesBruno Abrahao

Autonomic SLA-driven Provisioning for Cloud Applicationsnbonvin

Hierarchical SLA-based Service Selection for Multi-Cloud EnvironmentsSoodeh Farokhi

Introduction to Network Performance Measurement with Cisco IOS IP Service Lev...Cisco Canada

Recommandé

Application SLA - the missing part of complete SLA managementComarch

Machine Learning for automated diagnosis of distributed ...AEbutest

Enforcing Application SLA with Congress and MonascaFabio Giannetti

“Tools” and Standards for Cloud-SLASLA-Ready Network

Self-Adaptive SLA-Driven Capacity Management for Internet ServicesBruno Abrahao

Autonomic SLA-driven Provisioning for Cloud Applicationsnbonvin

Hierarchical SLA-based Service Selection for Multi-Cloud EnvironmentsSoodeh Farokhi

Introduction to Network Performance Measurement with Cisco IOS IP Service Lev...Cisco Canada

COLO: COarse-grain LOck-stepping Virtual Machines for Non-stop ServiceThe Linux Foundation

Database Health CheckPostgreSQL Experts, Inc.

Rail Performance in the Cloud - OpeningEngine Yard

MBL303 Scalable Mobile and Web Apps - AWS re: Invent 2012Amazon Web Services

Show me the money!Tomas Riha

DevOpsDays Houston 2019 - Erik Peterson - FinDevOps: Site Reliability in the ...DevOpsDays Houston

Serverless on AWS : Understanding the hard parts at Serverless Meetup Dusseld...Vadym Kazulkin

AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...Amazon Web Services

Capacidade PlanejadaRodrigo Campos

JAX 2011 - Garbage collection verstehenMichael Kopp

Leveraging your hadoop cluster better - running performant code at scaleMichael Kopp

Performance Management in ‘Big Data’ ApplicationsMichael Kopp

Application Performance Management in the Clouds - Lessons LearnedMichael Kopp

Releasing fast code - The DevOps approachMichael Kopp

What does performance mean in the cloudMichael Kopp

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

Key Features Of Token Development (1).pptxLBM Solutions

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

How to Remove Document Management Hurdles with X-Docs?XfilesPro

Contenu connexe

Similaire à SLAs and Performance in the Cloud: Because There is More Than "Just" Availability

COLO: COarse-grain LOck-stepping Virtual Machines for Non-stop ServiceThe Linux Foundation

Database Health CheckPostgreSQL Experts, Inc.

Rail Performance in the Cloud - OpeningEngine Yard

MBL303 Scalable Mobile and Web Apps - AWS re: Invent 2012Amazon Web Services

Show me the money!Tomas Riha

DevOpsDays Houston 2019 - Erik Peterson - FinDevOps: Site Reliability in the ...DevOpsDays Houston

Serverless on AWS : Understanding the hard parts at Serverless Meetup Dusseld...Vadym Kazulkin

AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...Amazon Web Services

Capacidade PlanejadaRodrigo Campos

Similaire à SLAs and Performance in the Cloud: Because There is More Than "Just" Availability (9)

COLO: COarse-grain LOck-stepping Virtual Machines for Non-stop Service

Database Health Check

Rail Performance in the Cloud - Opening

MBL303 Scalable Mobile and Web Apps - AWS re: Invent 2012

Show me the money!

DevOpsDays Houston 2019 - Erik Peterson - FinDevOps: Site Reliability in the ...

Serverless on AWS : Understanding the hard parts at Serverless Meetup Dusseld...

AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...

Capacidade Planejada

Plus de Michael Kopp

JAX 2011 - Garbage collection verstehenMichael Kopp

Leveraging your hadoop cluster better - running performant code at scaleMichael Kopp

Performance Management in ‘Big Data’ ApplicationsMichael Kopp

Application Performance Management in the Clouds - Lessons LearnedMichael Kopp

Releasing fast code - The DevOps approachMichael Kopp

What does performance mean in the cloudMichael Kopp

Plus de Michael Kopp (6)

JAX 2011 - Garbage collection verstehen

Leveraging your hadoop cluster better - running performant code at scale

Performance Management in ‘Big Data’ Applications

Application Performance Management in the Clouds - Lessons Learned

Releasing fast code - The DevOps approach

What does performance mean in the cloud

Dernier

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

Key Features Of Token Development (1).pptxLBM Solutions

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

How to Remove Document Management Hurdles with X-Docs?XfilesPro

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Dernier (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

Key Features Of Token Development (1).pptx

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

Salesforce Community Group Quito, Salesforce 101

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

How to Remove Document Management Hurdles with X-Docs?

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi

Unblocking The Main Thread Solving ANRs and Frozen Frames

08448380779 Call Girls In Friends Colony Women Seeking Men

Human Factors of XR: Using Human Factors to Design XR Systems

Presentation on how to chat with PDF using ChatGPT code interpreter

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

08448380779 Call Girls In Civil Lines Women Seeking Men

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation

SQL Database Design For Developers at php[tek] 2024

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Handwritten Text Recognition for manuscripts and early printed texts

SLAs and Performance in the Cloud: Because There is More Than "Just" Availability

1. SLAs and Performance in the Cloud: Because There is More Than “Just” Availability [February, 20, 2012]

2. Everybody wants a Cloud SLA • Security • Availability • Performance Amazon did not violate its SLA 2

3. Current State of Cloud SLAs • GoGrid – 100% Server Uptime, Credit is given if violated – reasonable efforts to insure that server storage is "persistent” – Network Latency SLA, Credit for prepared fees – Load Balancer: Uptime, latency and throughput • Rackspace – 100% Network excluding scheduled maintenance – Server outage repaired within the hour • Amazon: – 99.95% Annual Regional Availability • more than 1 Zone in the same Region unavailable • Instances have no outside connectivity for at least 5 minutes • API is not available to start new instances 3

4. No Capacity Guarantees 1000 800 600 Response 400 Time 200 Throughput 0 09:15 09:18 09:12 09:09 09:00 09:03 09:06 800 600 400 Response Steal Time! Time 200 Throughput Shared Resources! 0 09:00 09:03 09:06 09:09 09:12 09:15 09:18

5. Priorities have changed! • I don’t care about the underlying Hardware • Focus is on Business Value  My own Application • Performance Management reflects that Performance SLA must impact Application 5

6. Meaningful SLAs • Application Performance – End-to-End Response Time – Throughput • Application Availability – Reachable by the End Users Performance SLA is Application specific Cloud SLA cannot cover that directly 6

7. Possible Cloud Performance SLAs • IaaS – Guaranteed Capacity (CPU, Memory, Bandwidth…) – Guaranteed Latencies (Network, Load balancer, Disk…) – Meaning and Enforcement outside app context? • PaaS – Guaranteed on Application Interfaces – Meaning and Enforcement outside app context? 7

8. Side Effect of missing Performance SLAs No viable way to compare Price/Performance between multiple providers 8

9. APM to the Rescue 9

10. What we care about But slow is bad Faster is not better 10

11. End-to-End Response Time Performance User Click On the Web Server In the Application In the Cloud

12. Application Response Time Cloud DB Latency Performance 12

13. Cloud Performance SLA • Response Time SLA is Application based • Latencies can be measured in the Application • Latency SLA impacts Application and is enforceable 13

14. Capacity Usage Used CPU Time 14

15. AWS Elastic Map/Reduce Performance http://blog.dynatrace.com/2012/01/25/about-the-performance-of-map-reduce-jobs/ 15

16. Cloud Performance SLA • CPU Usage can be measured in the Application (Attention: this is not utilization!) • Capacity SLA is measurable and enforceable 16

17. Detect application hotspots 17

18. Putting Cloud Monitoring in Context Steal Time or out of CPU? Cause for Latency

19. Benefits of APM for Cloud Application? • Identify Performance Problems End-to-End! • Determine Cloud vs. Application Issue • Enforce Cloud Performance SLA • Enforce Third-Party SLAs Optimization can reduce the number of instances Reduces Cost! 19

20. Side Effect: A Price Performance Index • Dollar Value for acceptable Performance: 90th response time/(Total Cost/Number of Transactions) Desired Throughput/Total Cost – Mind Volatility – Price Performance Index is comparable • Cost Scalability – Cost per Transaction must remain stable Performance is no longer defined by Capacity It is a function of desired User Experience and associated Cost

21. Questions THANK YOU Michael Kopp Michael.kopp@dynaTrace.com http://blog.dynatrace.com @mikopp

Notes de l'éditeur

Thismeansthattwoapplications, ormore, canimpacteachother. This impactisreallyhiddenfromyourapplication, all itseesisthatitslows down orthatitdoesn‘tget 100% CPU. Even morethethingscaneffecteachotherthatcouldn‘tbefore: networkand I/O.