AWS Summit 2011: Designing Fault Tolerant Applicatons

Designing Fault-Tolerant Applications
Miles Ward
Enterprise Solutions Architect

Building Fault-Tolerant Applications on AWS

White paper published
last year

Sharing best practices

We’d like to hear your
best practices as well

http://media.amazonwebservices.com/AWS_Building_Fault_Tolerant_Applications.pdf
Copyright © 2011 Amazon Web Services

AWS Fault-Tolerant Building Blocks
Two approaches:
1) AWS services that are inherently fault-tolerant and highly
available:
• Amazon Simple Storage Service (S3)
• Amazon SimpleDB
• Amazon SQS, SNS, SES, CloudWatch, CloudFront, and more.

2) AWS services that offer tools and features to design fault-
tolerant and highly available systems:
• Amazon Elastic Compute Cloud (EC2)
– Availability Zones, Elastic IPs, EBS, etc.
– Flexible to trade off budget vs. time to recovery
• Amazon Relational Database Service (RDS)
– Multi-AZ Deployments
– Backup/Restore


Amazon EC2 Architecture

Amazon Region
Machine Availability Zone
Image (AMI) Ephemeral
Storage
EC2 Instance

Elastic
CloudWatch Block
Storage

Security
Group(s)

Auto Amazon S3
Scaling Elastic IP
EBS EBS
Address Snapshot Snapshot

Load Balancing

EC2 Features

AMI
 Packaged, reusable functionality

On-Instance Storage
 Lifetime tied to instance lifetime
 AFR like standard hard disk (around 5%)

EBS Volumes
 Lifetime independent of any particular EC2 instance
 Redundant within an AZ
 AFR is 0.1% to 0.5%
 Incorporate volume mappings into your architecture
 Use EBS snapshot backups


EC2 Features

Elastic IP Addresses
 Map to any EC2 instance within a given Region
 Detach from failed instance; map to replacement

Auto Scaling
 Two ways to use it:
• Respond to changing conditions by adding or terminating EC2
instances (attach to CloudWatch metrics)
• Maintain a fixed number of instances running, replacing them if
they fail or become unhealthy

Reserved Instances
 Guarantees capacity for when it’s needed


EC2 Features

CloudWatch Alarms


EC2 Features

Elastic Load Balancing
 Distributes incoming traffic across multiple instances
 Sends traffic only to healthy instances


Amazon EC2 Regions and Availability Zones

US East (Northern Virginia) EU (Dublin)

Availability Availability
Zone A Zone B
Zone A Zone B
Zone C Zone D

Amazon EC2 Regions:
US East (Northern Virginia) / US West (Northern California) /
EU (Ireland) / Asia Pacific (Singapore) / Asia Pacific (Tokyo)


Availability Zone Characteristics and Advice

Distinct physical locations

Low-latency network connections between AZs

Independent power, cooling, network, security

Always partition app stacks across 2 or more AZs

Elastic Load Balance across instances in multiple AZs


Proper Use of Multiple Availability Zones
Centralized Services (S3 Backups, SimpleDB, etc)

Availability Zone A Availability Zone B
Database Server or Database Server or
RDS DB Instance RDS DB Instance

App Server App Server

Web Server Web Server

Requests and Health Checks

Elastic Load Balancer

Copyright © 2011 Amazon Web Services Incoming Requests

Region Characteristics and Advice

Regions are:
 Functionally separate
 Composed of 2 or more AZs
 Connected via the public internet

Use regions to:
 Have functionality geographically close to customers
 Comply with national laws and practices
 Implement a DR strategy

RDS Fault-Tolerant Features

Multi-AZ Deployments
 Synchronous replication across AZs
 Automatic fail-over to standby replica
Automated Backups
 Enables point-in-time recovery of the DB instance
 Retention period configurable
Snapshots
 User initiated full backup of DB
 New DB can be created from snapshots

AWS Architectural
Guidance


Design For Failure – Basic Principles

Avoid single points of failure

Assume everything fails, and design backwards

Goal: Applications should continue to function even if the
underlying physical hardware fails or is removed or
replaced.

Design your recovery process

Trade off business needs vs. cost of high -availability


Design For Failure – Use AWS Building Blocks

Use Elastic IP addresses for consistent and re -
mappable routes
Use multiple Amazon EC2 Availability Zones (AZs )
Replicate data across multiple AZs
 Example: Amazon RDS Multi-AZ mode

Use real-time monitoring (Amazon CloudWatch)
Use Amazon Elastic Block Store (EBS) for persistent
file systems
Take EBS Snapshots and use S3 for backups


Copyright ©
2011 Amazon
Web Services
Build Loosely Coupled Systems

Use independent components

Design everything as a Black Box

Load-balance and scale clusters

Think about graceful degradation

Amazon SQS as Buffers
Tight Controller Controller Controller
A B C
Coupling
Q Q Q
Loose Coupling
Controller Controller Controller
using Queues A B C

Implement Elasticity

Don’t assume health or fixed location of components

Use designs that are resilient to reboot and re-launch

Bootstrap your instances –
 “Who am I am and what is my role?”

Enable dynamic configuration

Use configurations in SimpleDB for bootstrapping

Use Auto Scaling

Use Elastic Load Balancing on each tier


Implementing Elasticity
Elastic Load Balancing, CloudWatch, and AutoScaling

Elastic Load
Balancing

Utilization

Auto Scaling CloudWatch
Metrics


Copyright © 2011
Amazon Web

Use a Chaos Monkey Services

From the Netflix blog:

Simple monkey:
 Kill any instance in the account

Complex monkey:
 Kill instances with specific tags
 Introduce other faults (e.g. connectivity via Security Group)

Human monkey:
 Kill instances from the AWS Management Console

http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html

AWS Architecture Center

aws.amazon.com/architecture

White papers:
 Cloud architectures
 Building fault-tolerant applications
 Web hosting best practices
 Leveraging different storage options
 AWS security best practices


Thank You!


AWS Summit 2011: Designing Fault Tolerant Applicatons

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à AWS Summit 2011: Designing Fault Tolerant Applicatons

Similaire à AWS Summit 2011: Designing Fault Tolerant Applicatons (20)

Plus de Amazon Web Services

Plus de Amazon Web Services (20)

Dernier

Dernier (20)

AWS Summit 2011: Designing Fault Tolerant Applicatons