An SLA is only useful if it guarantees a certain level of quality. Current Cloud SLAs cover availability but ignore a key ingredient: Response and Throughput Performance. A Performance SLA would need to relate to the applications performance itself, something that no Cloud Provider has control over. We will discuss how Application Performance Monitoring can be used to define, measure and enforce a usable SLA for both sides. We will talk about the differences between IaaS and PaaS cloud providers concerning such an SLA. We will also show how this will lead to better User Experience with less R&D effort. Finally it enables us to easily compare cloud performance across vendors in terms that really matter: Response Time per Cost.
Handwritten Text Recognition for manuscripts and early printed texts
SLAs and Performance in the Cloud: Because There is More Than "Just" Availability
1. SLAs and Performance in the Cloud:
Because There is More Than “Just”
Availability
[February, 20, 2012]
2. Everybody wants a Cloud SLA
• Security
• Availability
• Performance
Amazon did not violate its SLA
2
3. Current State of Cloud SLAs
• GoGrid
– 100% Server Uptime, Credit is given if violated
– reasonable efforts to insure that server storage is "persistent”
– Network Latency SLA, Credit for prepared fees
– Load Balancer: Uptime, latency and throughput
• Rackspace
– 100% Network excluding scheduled maintenance
– Server outage repaired within the hour
• Amazon:
– 99.95% Annual Regional Availability
• more than 1 Zone in the same Region unavailable
• Instances have no outside connectivity for at least 5 minutes
• API is not available to start new instances
3
5. Priorities have changed!
• I don’t care about the underlying Hardware
• Focus is on Business Value My own Application
• Performance Management reflects that
Performance SLA must impact Application
5
6. Meaningful SLAs
• Application Performance
– End-to-End Response Time
– Throughput
• Application Availability
– Reachable by the End Users
Performance SLA is Application specific
Cloud SLA cannot cover that directly
6
7. Possible Cloud Performance SLAs
• IaaS
– Guaranteed Capacity (CPU, Memory, Bandwidth…)
– Guaranteed Latencies (Network, Load balancer, Disk…)
– Meaning and Enforcement outside app context?
• PaaS
– Guaranteed on Application Interfaces
– Meaning and Enforcement outside app context?
7
8. Side Effect of missing Performance SLAs
No viable way to compare Price/Performance
between multiple providers
8
13. Cloud Performance SLA
• Response Time SLA is Application based
• Latencies can be measured in the Application
• Latency SLA impacts Application and is enforceable
13
16. Cloud Performance SLA
• CPU Usage can be measured in the Application
(Attention: this is not utilization!)
• Capacity SLA is measurable and enforceable
16
19. Benefits of APM for Cloud Application?
• Identify Performance Problems End-to-End!
• Determine Cloud vs. Application Issue
• Enforce Cloud Performance SLA
• Enforce Third-Party SLAs
Optimization can reduce the number of instances
Reduces Cost!
19
20. Side Effect: A Price Performance Index
• Dollar Value for acceptable Performance:
90th response time/(Total Cost/Number of Transactions)
Desired Throughput/Total Cost
– Mind Volatility
– Price Performance Index is comparable
• Cost Scalability
– Cost per Transaction must remain stable
Performance is no longer defined by Capacity
It is a function of desired User
Experience and associated Cost
Thismeansthattwoapplications, ormore, canimpacteachother. This impactisreallyhiddenfromyourapplication, all itseesisthatitslows down orthatitdoesn‘tget 100% CPU. Even morethethingscaneffecteachotherthatcouldn‘tbefore: networkand I/O.