Availability Analysis for Deployment of In-Cloud Applications

Availability Analysis for Deployment
of In-Cloud Applications
Xiwei Xu, Qinghua Lu, Liming Zhu, Jim (Zhanwen) Li
Sherif Sakr, Hiroshi Wada, Ingo Weber
Software Systems Research Group, NICTA
ISARCS13, Vancouver
Slides at: http://www.slideshare.net/LimingZhu/

NICTA Copyright 2010 From imagination to impact 2
Motivation
• Uncertainties in Cloud are challenging for architecting
critical applications and understanding availability
– Shared resources, weak SLA guarantees and limited visibility
– Rare but high consequence events
– Sporadic activities: upgrade, backup, recovery…
– Subjective uncertainties: impact of configuration choices
• We want to explicitly model the above uncertainties in
application availability analysis of cloud deployment.
– from a cloud consumer perspective
– focusing on mechanisms most relevant to critical
applications: auto-scaling, over-
provisioning, backup, recovery and maintenance.

Contributions
• SRN(Stochastic Reward Net)-based availability models
• which allow you to specify:
– Deployment architecture (application placements in VM)
– Node/Aggregation level SLAs from infrastructure providers
– Auto-scaling policies and recovery strategies
– Rare events: availability zone or region down
• which give you application availability levels of different options
under different scenarios
• Model evaluation by analysing existing industry best
practices in cloud application deployment
– Quantifying the rule-of-thumb best practices
– Comparing different (best) practices

Deployment Architecture Assumption
– Stateless VMs: auto-scaling groups
– Stateful VMs: hot standbys
– Backup at separate region for recovery

Availability Analysis Overview
• SRN-based Models
• Architecture model and recovery model in this paper
• One SRN architecture model per availability zone

• Deployment decisions and patterns
– stateless/stateful application placement within VMs
– auto-scaling policies
– multi-zone configurations

• SLA from the cloud providers
• Node level (Rackspace) or zone level (Amazon)

• Recovery strategy
• Auto-regeneration of stateless VMs and different
recovery mechanisms for stateful VMs
• Different Recovery-Time/Point-Objective (RTO/RPO)

• Application-specific data
– Stateless VM start-up time…
– Stateful VM replication…

Stochastic Reward Net
• Stochastic Reward Net (SRN)
– Stochastic Petri Net variant
– Firing delays
– Reward function
• Constructs
• Places: VM states
(Full, Running, Stoped, Failed )
• Token: VMs
• Transition
• Guard function
• Transition rate: 1) frequency of
events, 2) delay before the
transition fires
• Reward Function:
if((#Running1>0) 1 else 0

SRN-based Availability Models

Availability Models: Auto-scaling

Availability Models: Auto-scaling
gScaleSelf1:
if(#Running1<=#Running2 && #Stopped1>0) 1 else 0
gScaleOther1:
if(#Running1>#Running2 && #Stopped2>0) 1 else 0

Availability Models: Stateful VM

Availability Models—Disaster Recovery
• Availability zone life cycle
– Interact with the big
architecture model
• Stateless VM recovery
– Backup/AMI
• Stateful VM recovery
– Backup
– Replica
– Hot standby

Case 1: Multi-zone Deployment
• Parameters
– Amazon EC2 SLA of 99.95% availability
– Zone fail rate: 0.00011, MTTR: 4.38 hours per year
– Application specific measurement of transitions
0.01% = 52.56 mins downtime per year
0.4% diff = 35 hours

Case 2: Recovery across Availability Zone
• Industry rule of thumb: ―Target auto-scale 30-60% until you have
50% headroom for load spikes. Lose an AZ leads to 90% utilisation.‖
• Impact on overall availability?
• 30-60% vs. traditional 70-90%?
• over-provisioning vs. auto-scaling?

Case 3: Disaster Recovery across Regions
• Trade-off between RPO and RTO
• RPO: Recovery Point Objective
• RTO: Recovery Time Objective
Yuruware — http://www.yuruware.com/

NICTA Copyright 2010 From imagination to impact
Conclusion and Future Work
• SRN-based availability models
– Application-level availability
– Highly configurable for different deployment architectures
– Model different uncertainties and scenarios for critical systems
– Quantify and compare choices and enable what-if analysis
– Evaluated using industry best practices
• Future work
– Better evaluation!
– Integrated models on impact of upgrade, live migration, backup and
subjective uncertainties (in IEEE Cloud 13)
Q. Lu, X. Xu, L. Zhu, L. Bass, et al., "Incorporating Uncertainty into in-Cloud Application
Deployment Decisions for Availability," in IEEE Cloud 2013
Liming.Zhu@nicta.com.au
Slides available at http://www.slideshare.net/LimingZhu/
19

Availability Analysis for Deployment of In-Cloud Applications

Recommended

Recommended

More Related Content

More from Liming Zhu

More from Liming Zhu (18)

Recently uploaded

Recently uploaded (20)

Availability Analysis for Deployment of In-Cloud Applications

Editor's Notes