SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
AWS Government, Education, &
Nonprofits Symposium
Canberra, Australia | May 6, 2015
“Spikey Workloads”: 

Emergency Management in the Cloud
Cameron Maxwell
Professional Services
Amazon Web Services
Michael Jenkins
Chief Architect
Emergency Management Victoria
Emergency Management Victoria
Use of AWS for Emergency Management
• We have adopted AWS for new Emergency Management
Workloads
• AWS has been used for public/community publishing to
Mobile and Web sites, notably the FireReady mobile app
and the http://emergency.vic.gov.au website.
• AWS is a fundamental enabler for our latest systems,
particularly ‘EM-COP’, a geospatial collaboration platform
for all responders across all related agencies, departments,
and private organisations in Victoria.
Emergency Management Tech 101
• From a tools and technology perspective, the
biggest challenge is being ready for sudden
spikes in workload
• Preparation, forecasts, and testing are essential
to be ready for massive demand at short/no
notice
• System performance matters most during an
emergency, which is the period of maximum use
System Failures Affect Us All
London's emergency services experience
telecoms failure
Telstra dials D for divert as
emergency call fail safe
Victorian emergency dispatch systems fail
six times in eight months
Flood warning failed to reach
many in Jambin
Brisbane Floods – Council Web
Site stays down during crisis
Prepare for Massive Demand
Prepare for Bursts of Demand
Think Through the Failure Scenarios
• Everything will fail eventually – how you respond
to failure is all important
• Have a plan B, C, and D where possible
• Research past failures – your own, your service
providers, other organisations in the sector.
• Don’t repeat past mistakes
Emergency Management in the Cloud
• Elastic, on demand
• Web scale
• Low cost
• Full range of services
• Many options for
reliability, availability
Figure 1. The Cloud
Design and Test for High Scalability
Scaling
Services
Tested to 200,000
simultaneous
users
Tested to 40 new
events/minute
Tested to
66,000,000
notifications/hour
Tested to 240,000
requests/hour
Handles additional
peak loads
Engineer for Reliability Under Load
• Design for Resilience
– Reduce Single Points of Failure
– Use CloudFront or another CDN instead of your own web server
cluster to reduce the compute dependency during high demand
– Design for multiple availability zones and regions from day one
• Design for Rapid Response and Recovery
– Integrate Route53 health checks and CloudWatch alarms for
automatic failover
– Invest the time in tuning ASG triggers and test them extensively
Launch When Proven, Certain
• An unreliable system can be worse than nothing
at all
• Maintain multiple channels for communication
and control
• Always consider business continuity and manual
work-arounds for worst case scenarios
• Even a few minutes outage could cost peoples’
lives
Scale Down to Save Costs
• Engineering for massive scale can be
prohibitively expensive, unless elastic services
are used
• AWS allows us to provide assured performance
for massive demand at short notice, and just as
quickly scale back to minimal cost
• With AWS we can deliver systems we could not
otherwise afford to operate
Conclusion
• We use AWS to rapidly scale up and down to
service unpredictable, spikey workloads
• We’ve engineered highly available and resilient
systems within AWS
• Hosting in AWS allows us to deliver and operate
systems that are reliable in emergencies without
investing in “worst case” infrastructure
Scalable Messaging Architecture
Push Notif Broker
APNS

iOS
GCM
Android
Autoscaling Action
• +100% instances
• 5m grace & cooldown
Total Fire Ban 

SQS queue
Incidents / Warnings
SQS queue
Master Node Slave Nodes
Sender nodes
Notif Batches 

SQS queue
Autoscaling Action
• +50% instances
• 5m grace & cooldown
CloudWatch Alarm
• > 500 messages
CloudWatch Alarm
• > 1000 messages
Autoscaling Action
• +200% instances
• 5m grace & cooldown
CloudWatch Alarm
• > 2000 messages
End User
OSOM Feed
Incident
Scalable Messaging Architecture
• Isolate different compute loads into independent Autoscaling groups
• Leverage queuing between processing tiers
• Scale up based on size of the preceding queue
• Use multiple queues for differing priority
• Use multiple scaling rules to handle logarithmic load increase
• Improve scaling event response times by reducing instance boot time
• Leverage AutoScaling for HA
AutoScaling GroupSQS QueueAutoScaling GroupSQS QueueAutoScaling Group CloudWatch
Testing
• What works well at the small scale does not always translate to
large scale
• Playback of real events to simulate known situations
• Test each component / tier independently in addition to the whole
• Use mockups / stubs to simulate external entities
• Know your user base and their platforms
• Test your scaling capability and response time
• Test your availability!
Unit Testing
• Replay data from previous known event
• Test processing tiers independently
• Input and output should be a known correlation
• Reuse input data from failed tests
Sender Nodes
Mockup Notification

Destination
Notif Batches
SQS Queue
Incidents / Warnings

SQS Queue
Message Feed

Load Generator Master Node
Notif Batches
SQS QueueSlave Nodes
Incidents / Warnings

SQS Queue
End to end load testing
• Replay data from previous known event, BUT add a load multiplier i.e. 10x users then 100x
users
• Identify weak points in the architecture & fix moving forward
• Identify Autoscaling parameters
• Mockup destinations must be as simple as possible
– 4 lines of PHP that dump the HTTP POST beats 1000s of lines of java that add unnecessary complexity
Sender Nodes
Mockup Notification

Destination
Notif Batches
SQS QueueSlave Nodes
Incidents / Warnings

SQS Queue
Message Feed

Load Generator Master Node
Auditability and Accounting
SNS Topic SQS Queue DynamoDB TableEvent Processor
Sender Nodes Mockup Notification

Destination
Notif Batches
SQS Queue
Slave NodesIncidents / Warnings

SQS Queue
Message Feed

Load Generator
Master Node
Auditor
Auditability and Accounting
• Correlation, summaries, accuracy
• Testing
– 100% aim
– Reproducibility
– Rapid iterations
• Operations
– Throughput
– Analysis of 3rd parties performance
• Asynchronous is a must
Uptake and success
User base tripled in 2 weeks
As of midnight on 17th Jan
• Over 330k devices registered
• 5.7M Total push notifications
• 1.4M Warnings
• 3.8M Incidents
• #1 App on iTunes store
Continuous Improvement
• Infrastructure and application evolve in tandem
• Accuracy
– Started at 66% accuracy @ 1% peak simulated load
– Finished at 100% accuracy @ 10% peak simulate load
– 3 days elapsed time
• Test frequency
– Started with > 4 hour turnaround
– Finished with < 1 hour turnaround
“…had been outside just 20
minutes earlier checking…”
“… just 30-40m away from
the house.”
“Without the app we wouldn’t
have known.”
Thank you

Contenu connexe

Tendances

Tendances (20)

AWS Security and SecOps
AWS Security and SecOpsAWS Security and SecOps
AWS Security and SecOps
 
Deep Dive on Elastic Load Balancing
Deep Dive on Elastic Load BalancingDeep Dive on Elastic Load Balancing
Deep Dive on Elastic Load Balancing
 
Mini-Training: Netflix Simian Army
Mini-Training: Netflix Simian ArmyMini-Training: Netflix Simian Army
Mini-Training: Netflix Simian Army
 
Building Scalable Websites for the Cloud
Building Scalable Websites for the CloudBuilding Scalable Websites for the Cloud
Building Scalable Websites for the Cloud
 
AWS re:Invent 2016: The AWS Hero’s Journey to Achieving Autonomous, Self-Heal...
AWS re:Invent 2016: The AWS Hero’s Journey to Achieving Autonomous, Self-Heal...AWS re:Invent 2016: The AWS Hero’s Journey to Achieving Autonomous, Self-Heal...
AWS re:Invent 2016: The AWS Hero’s Journey to Achieving Autonomous, Self-Heal...
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
 
Serverless design considerations for Cloud Native workloads
Serverless design considerations for Cloud Native workloadsServerless design considerations for Cloud Native workloads
Serverless design considerations for Cloud Native workloads
 
GitHub's Latest: Automation and More
GitHub's Latest: Automation and MoreGitHub's Latest: Automation and More
GitHub's Latest: Automation and More
 
Dystopia as a Service
Dystopia as a ServiceDystopia as a Service
Dystopia as a Service
 
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer ToolsDevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
 
Site reliability in the Serverless age - Serverless Boston 2019
Site reliability in the Serverless age  - Serverless Boston 2019Site reliability in the Serverless age  - Serverless Boston 2019
Site reliability in the Serverless age - Serverless Boston 2019
 
Configuration Management with AWS OpsWorks
Configuration Management with AWS OpsWorksConfiguration Management with AWS OpsWorks
Configuration Management with AWS OpsWorks
 
Automated Governance of Your AWS Resources
Automated Governance of Your AWS ResourcesAutomated Governance of Your AWS Resources
Automated Governance of Your AWS Resources
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
 
(SEC202) Best Practices for Securely Leveraging the Cloud
(SEC202) Best Practices for Securely Leveraging the Cloud(SEC202) Best Practices for Securely Leveraging the Cloud
(SEC202) Best Practices for Securely Leveraging the Cloud
 
Netflix Cloud Platform Building Blocks
Netflix Cloud Platform Building BlocksNetflix Cloud Platform Building Blocks
Netflix Cloud Platform Building Blocks
 
Advanced Continuous Delivery on AWS
Advanced Continuous Delivery on AWSAdvanced Continuous Delivery on AWS
Advanced Continuous Delivery on AWS
 
Apache Kafka - Patterns anti-patterns
Apache Kafka - Patterns anti-patternsApache Kafka - Patterns anti-patterns
Apache Kafka - Patterns anti-patterns
 
Serverless data processing with Data Pipeline
Serverless data processing with Data PipelineServerless data processing with Data Pipeline
Serverless data processing with Data Pipeline
 
High Availability Application Architectures in Amazon VPC (ARC202) | AWS re:I...
High Availability Application Architectures in Amazon VPC (ARC202) | AWS re:I...High Availability Application Architectures in Amazon VPC (ARC202) | AWS re:I...
High Availability Application Architectures in Amazon VPC (ARC202) | AWS re:I...
 

En vedette

AWS Summit 2011: Customer Presentation - NYTimes
AWS Summit 2011: Customer Presentation - NYTimesAWS Summit 2011: Customer Presentation - NYTimes
AWS Summit 2011: Customer Presentation - NYTimes
Amazon Web Services
 
AWS Cloud Kata 2013 | Singapore - Opening Keynote: Running Lean & Scaling Fas...
AWS Cloud Kata 2013 | Singapore - Opening Keynote: Running Lean & Scaling Fas...AWS Cloud Kata 2013 | Singapore - Opening Keynote: Running Lean & Scaling Fas...
AWS Cloud Kata 2013 | Singapore - Opening Keynote: Running Lean & Scaling Fas...
Amazon Web Services
 

En vedette (20)

AWS Summit 2011: Customer Presentation - NYTimes
AWS Summit 2011: Customer Presentation - NYTimesAWS Summit 2011: Customer Presentation - NYTimes
AWS Summit 2011: Customer Presentation - NYTimes
 
Double Redundancy with AWS Direct Connect - Pop-up Loft Tel Aviv
Double Redundancy with AWS Direct Connect - Pop-up Loft Tel AvivDouble Redundancy with AWS Direct Connect - Pop-up Loft Tel Aviv
Double Redundancy with AWS Direct Connect - Pop-up Loft Tel Aviv
 
Scale and Reach: Always Up - Always On - AWS Symposium 2014 - Washington D.C....
Scale and Reach: Always Up - Always On - AWS Symposium 2014 - Washington D.C....Scale and Reach: Always Up - Always On - AWS Symposium 2014 - Washington D.C....
Scale and Reach: Always Up - Always On - AWS Symposium 2014 - Washington D.C....
 
Security Day - Intro
Security Day - IntroSecurity Day - Intro
Security Day - Intro
 
Time to Science, Time to Results: Accelerating Research with AWS - AWS Sympos...
Time to Science, Time to Results: Accelerating Research with AWS - AWS Sympos...Time to Science, Time to Results: Accelerating Research with AWS - AWS Sympos...
Time to Science, Time to Results: Accelerating Research with AWS - AWS Sympos...
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
 
AWS Cloud Kata 2013 | Singapore - Opening Keynote: Running Lean & Scaling Fas...
AWS Cloud Kata 2013 | Singapore - Opening Keynote: Running Lean & Scaling Fas...AWS Cloud Kata 2013 | Singapore - Opening Keynote: Running Lean & Scaling Fas...
AWS Cloud Kata 2013 | Singapore - Opening Keynote: Running Lean & Scaling Fas...
 
Mobile Application Development
Mobile Application DevelopmentMobile Application Development
Mobile Application Development
 
Amazon Machine Learning: Empowering Developers to Build Smart Applications
Amazon Machine Learning: Empowering Developers to Build Smart ApplicationsAmazon Machine Learning: Empowering Developers to Build Smart Applications
Amazon Machine Learning: Empowering Developers to Build Smart Applications
 
Leveraging Hybid IT for More Robust Business Services
Leveraging Hybid IT for More Robust Business ServicesLeveraging Hybid IT for More Robust Business Services
Leveraging Hybid IT for More Robust Business Services
 
AWS Enterprise Summit London | National Rail Enquiries Darwin Migration
AWS Enterprise Summit London | National Rail Enquiries Darwin MigrationAWS Enterprise Summit London | National Rail Enquiries Darwin Migration
AWS Enterprise Summit London | National Rail Enquiries Darwin Migration
 
Canberra Symposium Keynote
Canberra Symposium KeynoteCanberra Symposium Keynote
Canberra Symposium Keynote
 
AWS Customer Service - Sonian
AWS Customer Service - Sonian AWS Customer Service - Sonian
AWS Customer Service - Sonian
 
Design Patterns for Developers - Technical 201
Design Patterns for Developers - Technical 201Design Patterns for Developers - Technical 201
Design Patterns for Developers - Technical 201
 
AWS Paris Summit 2014 - T2 - Amazon Workspaces, postes de travail sur le cloud
AWS Paris Summit 2014 - T2 - Amazon Workspaces, postes de travail sur le cloudAWS Paris Summit 2014 - T2 - Amazon Workspaces, postes de travail sur le cloud
AWS Paris Summit 2014 - T2 - Amazon Workspaces, postes de travail sur le cloud
 
CPN203 Saving with EC2 Spot Instances - AWS re: Invent 2012
CPN203 Saving with EC2 Spot Instances - AWS re: Invent 2012CPN203 Saving with EC2 Spot Instances - AWS re: Invent 2012
CPN203 Saving with EC2 Spot Instances - AWS re: Invent 2012
 
Running Microsoft Enterprise Workloads on Amazon Web Services
Running Microsoft Enterprise Workloads on Amazon Web ServicesRunning Microsoft Enterprise Workloads on Amazon Web Services
Running Microsoft Enterprise Workloads on Amazon Web Services
 
AWS Summit Bogotá Track Avanzado: Virtual Private Cloud
AWS Summit Bogotá Track Avanzado: Virtual Private Cloud AWS Summit Bogotá Track Avanzado: Virtual Private Cloud
AWS Summit Bogotá Track Avanzado: Virtual Private Cloud
 
Advanced Topics - Session 1 - Continuous Deployment Practices on AWS
Advanced Topics - Session 1 - Continuous Deployment Practices on AWSAdvanced Topics - Session 1 - Continuous Deployment Practices on AWS
Advanced Topics - Session 1 - Continuous Deployment Practices on AWS
 
AWS Summit 2013 | India - 0 to Production in 40 minutes, Pieter Kemps
AWS Summit 2013 | India - 0 to Production in 40 minutes, Pieter KempsAWS Summit 2013 | India - 0 to Production in 40 minutes, Pieter Kemps
AWS Summit 2013 | India - 0 to Production in 40 minutes, Pieter Kemps
 

Similaire à “Spikey Workloads” Emergency Management in the Cloud

Auto-Scaling Web Application Security in Amazon Web Services (SEC308) | AWS r...
Auto-Scaling Web Application Security in Amazon Web Services (SEC308) | AWS r...Auto-Scaling Web Application Security in Amazon Web Services (SEC308) | AWS r...
Auto-Scaling Web Application Security in Amazon Web Services (SEC308) | AWS r...
Amazon Web Services
 

Similaire à “Spikey Workloads” Emergency Management in the Cloud (20)

Introduction to AWS
Introduction to AWSIntroduction to AWS
Introduction to AWS
 
Autonomic Decentralised Elasticity Management of Cloud Applications
Autonomic Decentralised Elasticity Management of Cloud ApplicationsAutonomic Decentralised Elasticity Management of Cloud Applications
Autonomic Decentralised Elasticity Management of Cloud Applications
 
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
 
Auto-Scaling Web Application Security in Amazon Web Services (SEC308) | AWS r...
Auto-Scaling Web Application Security in Amazon Web Services (SEC308) | AWS r...Auto-Scaling Web Application Security in Amazon Web Services (SEC308) | AWS r...
Auto-Scaling Web Application Security in Amazon Web Services (SEC308) | AWS r...
 
ACROPOLIS CONTAINER SERVICES
ACROPOLIS CONTAINER SERVICESACROPOLIS CONTAINER SERVICES
ACROPOLIS CONTAINER SERVICES
 
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
 
Venugopal adec
Venugopal adecVenugopal adec
Venugopal adec
 
Tech Talk: Autoscaling with Amazon Web Services
Tech Talk: Autoscaling with Amazon Web ServicesTech Talk: Autoscaling with Amazon Web Services
Tech Talk: Autoscaling with Amazon Web Services
 
From AWS to Series A in 5 Easy Pieces
From AWS to Series A in 5 Easy PiecesFrom AWS to Series A in 5 Easy Pieces
From AWS to Series A in 5 Easy Pieces
 
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanDay 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
 
Interop ITX: Moving applications: From Legacy to Cloud-to-Cloud
Interop ITX: Moving applications: From Legacy to Cloud-to-CloudInterop ITX: Moving applications: From Legacy to Cloud-to-Cloud
Interop ITX: Moving applications: From Legacy to Cloud-to-Cloud
 
Un-clouding the cloud
Un-clouding the cloudUn-clouding the cloud
Un-clouding the cloud
 
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
 
Scalable Web Apps - Journey Through the Cloud
Scalable Web Apps - Journey Through the CloudScalable Web Apps - Journey Through the Cloud
Scalable Web Apps - Journey Through the Cloud
 
AWS 201 Webinar Series - Rightsizing and Cost Optimizing your Deployment
AWS 201 Webinar Series - Rightsizing and Cost Optimizing your DeploymentAWS 201 Webinar Series - Rightsizing and Cost Optimizing your Deployment
AWS 201 Webinar Series - Rightsizing and Cost Optimizing your Deployment
 
Scalable web apps on AWS - Hebrew Webinar September 2017
Scalable web apps on AWS - Hebrew Webinar September 2017Scalable web apps on AWS - Hebrew Webinar September 2017
Scalable web apps on AWS - Hebrew Webinar September 2017
 
XCloudLabs- AWS Overview
XCloudLabs- AWS Overview XCloudLabs- AWS Overview
XCloudLabs- AWS Overview
 
Ask The Architect: RightScale & AWS Dive Deep into Hybrid IT
Ask The Architect: RightScale & AWS Dive Deep into Hybrid ITAsk The Architect: RightScale & AWS Dive Deep into Hybrid IT
Ask The Architect: RightScale & AWS Dive Deep into Hybrid IT
 
Cloud Migration
Cloud MigrationCloud Migration
Cloud Migration
 
AWS Meetup - Nordstrom Data Lab and the AWS Cloud
AWS Meetup - Nordstrom Data Lab and the AWS CloudAWS Meetup - Nordstrom Data Lab and the AWS Cloud
AWS Meetup - Nordstrom Data Lab and the AWS Cloud
 

Plus de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Dernier

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Dernier (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

“Spikey Workloads” Emergency Management in the Cloud

  • 1. AWS Government, Education, & Nonprofits Symposium Canberra, Australia | May 6, 2015 “Spikey Workloads”: 
 Emergency Management in the Cloud Cameron Maxwell Professional Services Amazon Web Services Michael Jenkins Chief Architect Emergency Management Victoria
  • 3. Use of AWS for Emergency Management • We have adopted AWS for new Emergency Management Workloads • AWS has been used for public/community publishing to Mobile and Web sites, notably the FireReady mobile app and the http://emergency.vic.gov.au website. • AWS is a fundamental enabler for our latest systems, particularly ‘EM-COP’, a geospatial collaboration platform for all responders across all related agencies, departments, and private organisations in Victoria.
  • 4. Emergency Management Tech 101 • From a tools and technology perspective, the biggest challenge is being ready for sudden spikes in workload • Preparation, forecasts, and testing are essential to be ready for massive demand at short/no notice • System performance matters most during an emergency, which is the period of maximum use
  • 5. System Failures Affect Us All London's emergency services experience telecoms failure Telstra dials D for divert as emergency call fail safe Victorian emergency dispatch systems fail six times in eight months Flood warning failed to reach many in Jambin Brisbane Floods – Council Web Site stays down during crisis
  • 7. Prepare for Bursts of Demand
  • 8. Think Through the Failure Scenarios • Everything will fail eventually – how you respond to failure is all important • Have a plan B, C, and D where possible • Research past failures – your own, your service providers, other organisations in the sector. • Don’t repeat past mistakes
  • 9. Emergency Management in the Cloud • Elastic, on demand • Web scale • Low cost • Full range of services • Many options for reliability, availability Figure 1. The Cloud
  • 10. Design and Test for High Scalability Scaling Services Tested to 200,000 simultaneous users Tested to 40 new events/minute Tested to 66,000,000 notifications/hour Tested to 240,000 requests/hour Handles additional peak loads
  • 11. Engineer for Reliability Under Load • Design for Resilience – Reduce Single Points of Failure – Use CloudFront or another CDN instead of your own web server cluster to reduce the compute dependency during high demand – Design for multiple availability zones and regions from day one • Design for Rapid Response and Recovery – Integrate Route53 health checks and CloudWatch alarms for automatic failover – Invest the time in tuning ASG triggers and test them extensively
  • 12. Launch When Proven, Certain • An unreliable system can be worse than nothing at all • Maintain multiple channels for communication and control • Always consider business continuity and manual work-arounds for worst case scenarios • Even a few minutes outage could cost peoples’ lives
  • 13. Scale Down to Save Costs • Engineering for massive scale can be prohibitively expensive, unless elastic services are used • AWS allows us to provide assured performance for massive demand at short notice, and just as quickly scale back to minimal cost • With AWS we can deliver systems we could not otherwise afford to operate
  • 14. Conclusion • We use AWS to rapidly scale up and down to service unpredictable, spikey workloads • We’ve engineered highly available and resilient systems within AWS • Hosting in AWS allows us to deliver and operate systems that are reliable in emergencies without investing in “worst case” infrastructure
  • 15. Scalable Messaging Architecture Push Notif Broker APNS
 iOS GCM Android Autoscaling Action • +100% instances • 5m grace & cooldown Total Fire Ban 
 SQS queue Incidents / Warnings SQS queue Master Node Slave Nodes Sender nodes Notif Batches 
 SQS queue Autoscaling Action • +50% instances • 5m grace & cooldown CloudWatch Alarm • > 500 messages CloudWatch Alarm • > 1000 messages Autoscaling Action • +200% instances • 5m grace & cooldown CloudWatch Alarm • > 2000 messages End User OSOM Feed Incident
  • 16. Scalable Messaging Architecture • Isolate different compute loads into independent Autoscaling groups • Leverage queuing between processing tiers • Scale up based on size of the preceding queue • Use multiple queues for differing priority • Use multiple scaling rules to handle logarithmic load increase • Improve scaling event response times by reducing instance boot time • Leverage AutoScaling for HA AutoScaling GroupSQS QueueAutoScaling GroupSQS QueueAutoScaling Group CloudWatch
  • 17. Testing • What works well at the small scale does not always translate to large scale • Playback of real events to simulate known situations • Test each component / tier independently in addition to the whole • Use mockups / stubs to simulate external entities • Know your user base and their platforms • Test your scaling capability and response time • Test your availability!
  • 18. Unit Testing • Replay data from previous known event • Test processing tiers independently • Input and output should be a known correlation • Reuse input data from failed tests Sender Nodes Mockup Notification
 Destination Notif Batches SQS Queue Incidents / Warnings
 SQS Queue Message Feed
 Load Generator Master Node Notif Batches SQS QueueSlave Nodes Incidents / Warnings
 SQS Queue
  • 19. End to end load testing • Replay data from previous known event, BUT add a load multiplier i.e. 10x users then 100x users • Identify weak points in the architecture & fix moving forward • Identify Autoscaling parameters • Mockup destinations must be as simple as possible – 4 lines of PHP that dump the HTTP POST beats 1000s of lines of java that add unnecessary complexity Sender Nodes Mockup Notification
 Destination Notif Batches SQS QueueSlave Nodes Incidents / Warnings
 SQS Queue Message Feed
 Load Generator Master Node
  • 20. Auditability and Accounting SNS Topic SQS Queue DynamoDB TableEvent Processor Sender Nodes Mockup Notification
 Destination Notif Batches SQS Queue Slave NodesIncidents / Warnings
 SQS Queue Message Feed
 Load Generator Master Node Auditor
  • 21. Auditability and Accounting • Correlation, summaries, accuracy • Testing – 100% aim – Reproducibility – Rapid iterations • Operations – Throughput – Analysis of 3rd parties performance • Asynchronous is a must
  • 22. Uptake and success User base tripled in 2 weeks As of midnight on 17th Jan • Over 330k devices registered • 5.7M Total push notifications • 1.4M Warnings • 3.8M Incidents • #1 App on iTunes store
  • 23. Continuous Improvement • Infrastructure and application evolve in tandem • Accuracy – Started at 66% accuracy @ 1% peak simulated load – Finished at 100% accuracy @ 10% peak simulate load – 3 days elapsed time • Test frequency – Started with > 4 hour turnaround – Finished with < 1 hour turnaround
  • 24. “…had been outside just 20 minutes earlier checking…” “… just 30-40m away from the house.” “Without the app we wouldn’t have known.”