Contenu connexe
Similaire à Architecting for the cloud cloud providers (20)
Architecting for the cloud cloud providers
- 1. © Matthew Bass 2013
Architecting for the Cloud
Len and Matt Bass
Cloud Providers
- 2. © Matthew Bass 2013
IaaS Providers
• There are several primary providers
– Amazon: Amazon Web Services (AWS)
– Microsoft: Azure
– Google: Google Compute Engine
– …
• Each of these are set up a bit differently with slightly different
internal decisions and associated services
- 3. © Matthew Bass 2013
Goals
• The goals for this talk is not to give you a definitive how to for
each provider
• It’s meant to give you just an introduction
• The idea is that you’ll see how the concepts that we talked
about in the course map to specific providers
• We’ll look primarily at Amazon (with some details from others
thrown in)
• We’ll go through both the overall structure and look at specific
services
- 4. © Matthew Bass 2013
Amazon Elastic Compute Cloud
• Amazon EC2 provides compute capacity in the cloud
• You can select the machine image with a given OS and specified
capability
• You can resize the capacity as needed
• Takes minutes to spin up a new VM
• You can specify multiple instances and select where they will run
– Region & availability zones
• You pay per usage/hour depending on the capability of the instance
and if it’s a reserved instance (dedicated)
- 5. © Matthew Bass 2013
Regions
• Amazon has divided their cloud offerings into multiple regions. Each region
should be thought of as a separate cloud
– I.e. there is no automatic copying of data from one region to another.
- 6. © Matthew Bass 2013
Current AWS Regions
• North America:
– US East (5 availability zones)
– US West Oregon (3 availability zones)
– US West Northern California (3 availability zones)
– USGov Cloud (2 availability zones)
• South America
– Sao Paulo (2 availability zones)
• Europe
– Ireland (2 availability zones)
• Asia Pacific
– Sydney (2 availability zones)
– Singapore (2 availability zones)
– China (1 availability zone)
– Tokyo (3 availability zones)
- 7. © Matthew Bass 2013
AWS and Services
• Amazon Web Services offers a number of services
• These services are things like:
– Storage
– Database
– Network capabilities
– Monitoring
– …
• Not all services are available at all regions
– https://aws.amazon.com/about-aws/globalinfrastructure/regional-
product-services/
- 8. © Matthew Bass 2013
Amazon Availability Zones
• Amazon has a notion of availability zones
• Engineered to be insulated from failures in other availability zones
• Availability zones are locations within a region
• Amazon has not announced the details of an availability region but presumably they
are
– Physically separate data centers
– Have independent networks
– Have independent power delivery
– …
- 9. © Matthew Bass 2013
Amazon Service Level Agreement
• Amazon guarantees 99.95% availability for each region
• IaaS consumers are free to deploy their applications:
– Within an availability zone
– Across availability zones but within a region
– Across regions
• Amazon does not make any claim about the availability of their availability zones
(that I could find)
- 14. © Matthew Bass 2013
Elastic Compute Cloud (EC2) & Redundancy
• EC2 supports different levels of redundancy
– It is up to the customer to determine how much redundancy they
wish to have and how much they wish to pay for it
• Redundant elements can be:
– Within an availability zone
– Across availability zones
– Across regions
- 15. © Matthew Bass 2013
Microsoft Azure Regions
• North America
– US Central (Iowa)
– US East (Virginia)
– US East 2 (Virginia)
– US North Central (Illinois)
– US South Central (Texas)
– US West (California)
• Europe
– Europe North (Ireland)
– Europe West (Netherlands)
• Asia Pacific
– East (Hong Kong)
– Southeast (Singapore)
• Japan
– Japan East (Saitama)
– Japan West (Osaka)
• Brazil
– Sao Paulo
- 16. © Matthew Bass 2013
Fault Domains in Azure
• In Azure there is the concept of Fault Domains
• A Fault Domain is essentially a rack in a given datacenter
• A consumer is not able to define which fault zones the
application are distributed to
– Unlike an availability zone
• As a result the fault zone is really an internal structure
- 17. © Matthew Bass 2013
Upgrade Domains in Azure
• An upgrade domain is similar to a fault domain
• Essentially an upgrade domain will be upgraded at one time
– When Microsoft upgrades their internal infrastructure they do so a
domain at a time
• In order to guard against failures within a fault domains and
upgrades you need to replicate across both fault and upgrade
domains
• This is called an availability set
- 19. © Matthew Bass 2013
Amazon Auto Scaling
• Auto Scaling works in conjunction with Cloudwatch (Amazon’s monitoring
service)
• The idea is the monitoring service monitors the metrics
– CPU utilization
– Latency
– Memory consumption
• The Auto Scaling solution establishes the rules
– Add instances when utilization exceeds 70%
– Remove instances when utilization falls below 10%
• You can specify things like a “cooling off” period
– Where no action is taken until the system has a chance to stabilize
- 20. © Matthew Bass 2013
Amazon Elastic Load Balancer
• This is Amazon’s load balancing solution
– Recall the push/pull architecture discussion
• It tracks the status and location of instances
• Routes requests to healthy instances based on criteria that you establish
• Can be used in conjunction with Auto Scaling
– When new instances are added or removed they are registered with the ELB
• Can use in conjunction with Amazon’s DNS (route 53)
– You can use DNS failover to move from one region to another
– The DNS will route traffic to the ELB in the target region
- 21. © Matthew Bass 2013
Amazon Simple Queue Service
• SQS is Amazon’s queuing service
– Again recall the push/pull architecture discussion
• It’s a service that supports message queues
• Recall it can be used in conjunction with Auto Scaling to
manage the elasticity of your application
• Pricing is per million requests handled
- 22. © Matthew Bass 2013
Amazon Storage Solutions
• Amazon has several storage solutions
– Elastic Block Store (EBS)
– Simple Storage Solution (S3)
– Glacier
• These provide raw unmanaged storage
• This is useful for:
– Disaster recovery
– Backup
– Archiving
– Persistence for your own database solution
- 23. © Matthew Bass 2013
Amazon Elastic Block Store
Amazon Elastic Block Store (EBS) is Amazon’s data file system.
Some of its features are
• Data is persisted independently from instances
• EBS data is placed in a specific availability zones and can be attached to instances in
the same availability zone
• EBS data is automatically replicated within availability zone
• There are two networks that connect EBS instances
– A high speed network to provide coordination among instances and move data between
instances.
– A lower speed network used as backup for coordination.
• $0.05 per million I/O requests
- 24. © Matthew Bass 2013
Amazon Simple Storage Solution (S3)
• S3 is a scalable storage solution
• Good for content storage and distribution
• Good for backup, archiving, and disaster recovery
• Costs $0.03 per GB of data
• More expensive but faster than Glacier
• Not as fast for I/O as EBS
- 25. © Matthew Bass 2013
Amazon Glacier
• Low cost storage solution
• Good for off site archival of Enterprise data
• Good for backup and data archiving
• Good for large volumes of data
• Costs $0.01 per GB of data
- 26. © Matthew Bass 2013
Amazon Database Solutions
• Amazon has a number of fully managed database solutions
• These are built on top of one of Amazon’s storage solutions
• They include:
– DynamoDB
– Relational Data Store (RDS)
– Redshift
– ElastiCache
- 27. © Matthew Bass 2013
DynamoDB
• Key Value data store
• Uses a throughput oriented pricing model (rather than a
storage oriented model)
• Uses solid state drives
• Guarantees single digit read latencies
• You pay a flat hourly rate based on capacity that you reserve
– Costs $0.0065 per hour for every 10 units of write capacity
– Costs $0.0065 per hour for every 10 unites of read capacity
- 28. © Matthew Bass 2013
Relational Data Store
• A distributed relational web service that provides a
relational database for use in applications
• It provides access to MySQL, Oracle, SQL Server, or
PostgreSQL
• It simplifies installation, patching, and backup related
issues
• Priced per hour according to db type, size, and number
- 29. © Matthew Bass 2013
Redshift
• Redshift is Amazon’s data warehousing solution
• Integrates with other storage solutions
• Priced at either $0.25 per hour on the low end
• $1000/year per terabyte per year
- 30. © Matthew Bass 2013
ElastiCache
• A Web Service that enables an in memory data cache
• Supports:
– Memcached
– Redis
• Improves latency and throughput for read heavy applications
• Prices are per Cache node/hour
- 31. © Matthew Bass 2013
Amazon CloudFront
• Amazon’s content delivery network
• Provides edge services
– Competes with companies such as Akamai
• This service will allow you to locate content closer to users
– Reduces latency
• You specify the edge location and point it to the origin
• You can route DNS to the edge location if you want
- 32. © Matthew Bass 2013
Amazon Elastic IP Addressing
• Amazon provides elastic IP addressing
• The IP address is associated with your account – not with an
instance
• You can programmatically map the elastic IP to any instance in
your account
• In this way you make the deployment configuration
transparent to the user/application
– Remember the virtual network discussion?
- 33. © Matthew Bass 2013
Many Other Services Available
• Authentication services
• Analytics
• Elastic Map Reduce
• Real time data streaming and processing
• Business process automation services
• Email services
• Notification services
• …
- 34. © Matthew Bass 2013
Comparison to Other Providers
• Other major providers (Google, Microsoft, Rackspace) offer
similar services
• Google doesn’t have as many services but has different pricing
model
– Charges in 10 minute increments rather than one hour increment
• Microsoft has similar services
• Rackspace also provides comparable options
- 35. © Matthew Bass 2013
Outages
• In Amazon (and others) there are some kinds of outages that
are specific to the structure of the provider
• We will now look at some of these outages
- 36. © Matthew Bass 2013
Zone Failure
• All of the IaaS providers have some notion of an “availability zone”
• An availability zone (or fault domain in Azure) has it’s own switch,
router, and rack
• These availability zones are isolated from each other in a way that
nodes within an availability zone are not
- 37. © Matthew Bass 2013
Zone Failure Modes
• A zone can fail in different ways
Zone 1 Zone 2 Zone 3
Region
- 38. © Matthew Bass 2013
Complete Failure
• If for example you have a power outage you’ll have a complete
failure
• If you try to route traffic to any of these machines you’ll get a “no
route to host”
– This happens quickly – fast fail
• You’ll know the zone is out
• You can then spin up a new zone elsewhere
- 39. © Matthew Bass 2013
Zone Failure Modes
• You could have a network failure
Zone 1 Zone 2 Zone 3
Region
- 40. © Matthew Bass 2013
Network Failure
• If you have a network failure it’s typically not a complete failure
• The machines are still working but the network is having trouble
• There is often still a route to host but your data isn’t reaching the
host
• As a result you don’t get a fast fail
– You’ll get long timeouts
- 41. © Matthew Bass 2013
Network Failure
• With the long timeouts your system will start to back up
• It’s difficult to tell the difference between this issue and other
issues that result in latency lags
• This problem can be intermittent as some of the routers might be
down but not all
- 42. © Matthew Bass 2013
Zone Failure Modes
• You could have a failure of some zone service
Zone 1 Zone 2 Zone 3
Region
- 43. © Matthew Bass 2013
Zone Service Failure
• This is some when a service fails that the zone is dependent on
– It could be something that is part of the platform as a service (e.g.
EBS)
– It could also be a central service in your application
• This causes cascading failures
• Difficult to figure out what is going on
- 44. © Matthew Bass 2013
Region Failure
• It’s rare but a Region can fail as well
• Both complete and partial failures have happened
• Typically this starts with isolated issues that cascade
• There might be an issue with a few nodes or with a single availability zone
• Other zones become impacted (often due to additional traffic) and fail
– It can be difficult to determine the scope of the issue while it’s occurring
- 45. © Matthew Bass 2013
Regional Failure Modes
• You could loose network access to a region
Zone 1 Zone 2 Zone 3
Region
- 46. © Matthew Bass 2013
Regional Outage
• This is often caused by
– a DNS issue
– Router issues
– Network capacity overload
• Causes you to loose access to a region
- 47. © Matthew Bass 2013
Regional Failure Modes
• Local failures can cause a control plane overload
Zone 1 Zone 2 Zone 3
Region
- 48. © Matthew Bass 2013
Data Store Failure
• As with the other portions of the system the data store can become
unresponsive
• The remedy for this is typically to mark this node as bad and attempt to
bring a new node online
• If the issue is more pervasive it can result in:
– Disrupted availability
– Loss of persistent data
- 49. © Matthew Bass 2013
Backup Failure
• Systems will often have a backup data mechanism
• This is often a key component in disaster recovery
• This can also fail
– It can become temporarily or permanently unavailable
- 50. © Matthew Bass 2013
Upgrades
• Cloud providers need to upgrade their software as well
• When they do this the nodes that are being upgraded
experience an outage
• If your software is running on these nodes you might
experience an outage as well
- 51. © Matthew Bass 2013
Utilizing AWS
• You can utilize AWS in many ways
– You can host your entire application in the cloud
– You can host a specific portion of your application in the cloud
– You can use the cloud for a specialized need
- 52. © Matthew Bass 2013
Hosting Your Application
• You can have a system that is fully deployed in the cloud
• You’ll need to figure out how to structure the application to achieve both functional and quality
attribute needs
• You’ll want to first consider quality attribute concerns such as:
– Scalability
– Availability
– Security
– …
• Utilize the techniques we talked about to determine the needs
– Fault modeling (considering the cloud specific faults)
– Threat modeling
– Understanding the anticipated load and desired throughput and latency
• Come up with a gross structure that achieves your objectives
– Think about partitioning of the system to support testing, degraded modes of operation and independent
deployment
- 53. © Matthew Bass 2013
Partial Hosting
• You might want to leverage the cloud for a specific portion of your
system e.g.
– Supporting mobile applications
– Databases
– Analytics
– Delivery of particular content
– Hosting your front end
– …
• This is typically going to be driven by cost and quality attribute
needs (e.g. scalability)
- 54. © Matthew Bass 2013
Backup and Recovery
• Many organizations utilize the cloud for bulk storage, archiving,
or back up and recovery
• In the past external services were used for such needs
– They often stored data on tape in separate physical locations
• It can be cheaper and more convenient to utilize cloud services
• As a result many organizations use the cloud for such storage
needs
- 55. © Matthew Bass 2013
Summary
• Many services are available in the cloud
– Storage
– Network
– Compute related services
– …
• These services provide different levels of service at different pricing
levels
• Utilizing the cloud appropriately and efficiently takes an explicit
understanding of both your needs and the services available