SlideShare a Scribd company logo
1 of 34
Protect your app from Outages
Nati Shalom CTO GigaSpaces
@natishalom
May 2013
 AWS and outages
 Outage impact
 Disaster Recovery – it’s all about redundancy!
 Cloudify as a solution for redundancy
 Demo with Cloudify on EC2
® Copyright 2013 GigaSpaces Ltd. All Rights Reserved2
AGENDA
3
AWS USAGE
• AWS – around 0.5M servers
• Facebook – less than 0.1M servers
• Google – around 1M servers
THE OUTAGE PROBLEM
4
OUTAGE – APRIL 21, 2011
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved5
OUTAGE - JUNE 29, 2012
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved6
OUTAGE - OCTOBER 22, 2012
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved7
OUTAGE - CHRISTMAS EVE 2012
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved8
NOT ONLY AMAZON
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved9
 28 December 2012 - some owners of
Microsoft's XBox 360 gaming console were
unable to access some of their cloud-based
storage files.
 26 July 2012 - Service for Microsoft’s
Windows Azure Europe region went down for
more than two hours
 29 February 2012 - The ultimate result was
service impacts of 8-10 hours for users of
Azure data centers in Dublin, Ireland, Chicago,
and San Antonio.
10
THAT’S WHAT YOU EXPECT?
99% - 3.65 days downtime
99.9% - 8.76 hours downtime
99.99% - 53 minutes downtime
99.999% - 5.26 minutes downtime
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved11
OUTAGE IMPACT – DESIGN FOR FAILURES
Outage could cost…
$89K per hour for Amadeus
$225K per hour for PayPal!
12
DISASTER RECOVERY
13
MULTI CLOUD
14
PREPARE FOR DISASTER RECOVERY
•Dedicated expert for DR architecture
•Define target recovery time & point
•Assume every tier can fail
•Use monitoring and alerts
•Document your operational processes
15
CHAOS MONKEY
16
17
CLONE YOUR ENVIORMENT
18
CLONE YOUR DATA
19
Leverage Existing Automation Frameworks
Configuration Centric APP Centric (PaaS)
CLONE YOUR ENV - HOW DOES IT WORK?
BUILT IN SUPPORT FOR MANAGING DATA IN THE CLOUD
Real Time Relational DB
Clusters
NoSQL Clusters Hadoop
Storm MySQL MongoDB Hadoop (Hive,
Pig,..)
Elastic Caching XAP Postgress Cassandra ZooKeeper
Couchbase
ElasticSearch
23
 Technology-based concrete
process control and information
service
 Deployments across North
America, Latin America, Asia, and
Europe for nearly a decade
 Part of W.R. Grace & Co , $6.3 B
Company.
 The problem: On-Demand HA/DR
over multiple Cloud regions.
CASE STUDY: VERIFI
24
High
Availability
Data
Replication
Disaster
Recovery
ELASTIC ON-DEMAND DISASTER RECOVERY
25
 Problem
 Can we eliminate the
RTO vs. Cost trade-off
in the cloud?
 Solution (Elastic DR)
 A hybrid between Hot
and Warm DR
 Switch to Active site
in matter of seconds
through cloud-
agnostic lifecycle
automation recipes
VERIFI (INITIAL) ARCHITECTURE
26
Availability region (US-West: Oregon)
Data Volume
Internet EC2 Instance
mod_cluster
EC2 Instance
JBoss
Data Volume
EC2 Instance
EC2 Instance
PostgresSQL
Cassandra
4 recipes
ELASTIC DR ON-DEMAND: FAILOVER SCENARIO
27
Region (US-West Oregon)
App Servers
PostgresSQL
Region (US-East Virginia)
PostgresSQL
Cloud #1 Cloud #2
Region (US-East Virginia )
PostgresSQL
Cloud #1 Cloud #2
App Servers
Region (US-West California)
PostgresSQL
Cloud #3
Region failure
occurs
* Initially, all those actions may be done manually by
Verifi’s Ops team (e.g.: via recipe commands in CLI)
Bootstrap another cloud in
a different region using the
same application recipe
used to bootstrap cloud #2
above*
Liveness poll
Liveness poll
Upon initial deployment, the primary deplyoment
of the application “verifi” will be bootstrapped
onto cloud #1, another slightly modified
application recipe “verifi_dr” will be bootstrapped
as cloud #2, polling cloud #1 for failure, and acting
as a PostgresSQL db slave.
Turn Postgres slave into
master, Start app server
instances*
FAILOVER SCENARIO
28
Region (US-West Oregon)
App Servers
PostgresSQL
Region (US-East Virginia)
PostgresSQL
Cloud #1 Cloud #2
Region (US-East Virginia )
PostgresSQL
Cloud #1 Cloud #2
App Servers
Region (US-West California)
PostgresSQL
Cloud #3
Region failure
occurs
Bootstrap another cloud in
a different region using the
same application recipe
used to bootstrap cloud #2
above*
Liveness poll
Liveness poll
Upon initial deployment, the primary deployment
of the application will be bootstrapped onto cloud
#1, another slightly modified application recipe
will be bootstrapped as cloud #2, polling cloud #1
for failure, and acting as a PostgresSQL db slave.
Turn Postgres slave into
master, Start app server
instances*
Copyright 2012 Gigaspaces. All Rights Reserved29
NEXT STEPS
Across clouds
(AWS, Rackspace, Azure…etc)
Across AWS regions
Across AWS zones
1 application
+ overrides
Several cloud
drivers
1 application
+ overrides
1 cloud driver
1 application +
overrides
1 cloud driver
Availability
Supported by
Verifi phase #1
Copyright 2013 Gigaspaces. All Rights Reserved30
ELASTIC ON-DEMAND DR: COSTS
Main Site (US-West) Warm DR Site (US-East) Hot DR Site
Cost $82,068 $12,625 $82,068
 Main Site
 1 Load balancer, 2 JBoss instances, 1 PostgreSQL master, 3 Cassandra
 DR Site
 1 PostgreSQL slave – All other instance start on demand upon failover
Copyright 2013 Gigaspaces. All Rights Reserved31
ELASTIC DR: WARM DR COST, CLOUD PORTABILITY
4 recipes
DR Site
$12k
SameRecipe
$14k
$6k
$5k
$9k
Copyright 2013 Gigaspaces. All Rights Reserved32
ELASTIC DR: HOT DR COST
4 recipes
DR Site
$82k
SameRecipe
$79k
$115k
$68k
$91k
 Disaster Recovery – it’s all about redundancy!
 Cloning your environment – app stack
 Cloning your Data – DB Replication
 Automation makes DR processes simple
 Use recipes to clone your app stack consistently
 Use replication to clone your data
 Leverage cloud economics to reduce the cost
 DR on Demand
 Multi Cloud
® Copyright 2013 GigaSpaces Ltd. All Rights Reserved33
SUMMARY
Thank You!
@natishalom
® Copyright 2013 GigaSpaces Ltd. All Rights Reserved34
QUESTIONS & ANSWERS

More Related Content

What's hot

CloudCrowd- Orbyte Presentation on Web 2.0 Trading
CloudCrowd- Orbyte Presentation on Web 2.0 TradingCloudCrowd- Orbyte Presentation on Web 2.0 Trading
CloudCrowd- Orbyte Presentation on Web 2.0 TradingNati Shalom
 
Giga spaces cloudify road map-3 (citi)
Giga spaces cloudify road map-3 (citi)Giga spaces cloudify road map-3 (citi)
Giga spaces cloudify road map-3 (citi)Nati Shalom
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Germany
 
Magento performance comparison - AWS vs DO
Magento performance comparison - AWS vs DOMagento performance comparison - AWS vs DO
Magento performance comparison - AWS vs DONeev Technologies
 
Kubernetes and Terraform in the Cloud: How RightScale Does DevOps
Kubernetes and Terraform in the Cloud: How RightScale Does DevOpsKubernetes and Terraform in the Cloud: How RightScale Does DevOps
Kubernetes and Terraform in the Cloud: How RightScale Does DevOpsRightScale
 
Level up your SQL and Azure, by using Rubrik
Level up your SQL and Azure, by using RubrikLevel up your SQL and Azure, by using Rubrik
Level up your SQL and Azure, by using RubrikJaap Brasser
 
Orchestrating PaaS and IaaS+ with RightScale
Orchestrating PaaS and IaaS+ with RightScaleOrchestrating PaaS and IaaS+ with RightScale
Orchestrating PaaS and IaaS+ with RightScaleRightScale
 
Managing Container-as-a-Service and Docker Clusters in the Cloud with RightScale
Managing Container-as-a-Service and Docker Clusters in the Cloud with RightScaleManaging Container-as-a-Service and Docker Clusters in the Cloud with RightScale
Managing Container-as-a-Service and Docker Clusters in the Cloud with RightScaleRightScale
 
7 Common Questions About a Cloud Management Platform
7 Common Questions About a Cloud Management Platform7 Common Questions About a Cloud Management Platform
7 Common Questions About a Cloud Management PlatformRightScale
 
10 Good Reasons: NetApp Ransomware Protection
10 Good Reasons: NetApp Ransomware Protection10 Good Reasons: NetApp Ransomware Protection
10 Good Reasons: NetApp Ransomware ProtectionNetApp
 
Building Hybrid Cloud Architectures with GigaSpaces XAP
Building Hybrid Cloud Architectures with GigaSpaces XAPBuilding Hybrid Cloud Architectures with GigaSpaces XAP
Building Hybrid Cloud Architectures with GigaSpaces XAPjimliddle
 
Intro to Cloudify
Intro to CloudifyIntro to Cloudify
Intro to CloudifyRon Zavner
 
Move Out of the Data Center to Reach More Customers
Move Out of the Data Center to Reach More CustomersMove Out of the Data Center to Reach More Customers
Move Out of the Data Center to Reach More CustomersAmazon Web Services
 
BlueData Integration with Cloudera Manager
BlueData Integration with Cloudera ManagerBlueData Integration with Cloudera Manager
BlueData Integration with Cloudera ManagerBlueData, Inc.
 
Tagging Best Practices for Cloud Governance
Tagging Best Practices for Cloud GovernanceTagging Best Practices for Cloud Governance
Tagging Best Practices for Cloud GovernanceRightScale
 
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...RightScale
 
AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...
AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...
AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...AWS Germany
 
AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...
AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...
AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...AWS Germany
 
Manage and Optimize Cloud Spend with RightScale Optima
Manage and Optimize Cloud Spend with RightScale OptimaManage and Optimize Cloud Spend with RightScale Optima
Manage and Optimize Cloud Spend with RightScale OptimaRightScale
 
Modern Scheduling for Modern Applications with Nomad
Modern Scheduling for Modern Applications with NomadModern Scheduling for Modern Applications with Nomad
Modern Scheduling for Modern Applications with NomadMitchell Pronschinske
 

What's hot (20)

CloudCrowd- Orbyte Presentation on Web 2.0 Trading
CloudCrowd- Orbyte Presentation on Web 2.0 TradingCloudCrowd- Orbyte Presentation on Web 2.0 Trading
CloudCrowd- Orbyte Presentation on Web 2.0 Trading
 
Giga spaces cloudify road map-3 (citi)
Giga spaces cloudify road map-3 (citi)Giga spaces cloudify road map-3 (citi)
Giga spaces cloudify road map-3 (citi)
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data Analytics
 
Magento performance comparison - AWS vs DO
Magento performance comparison - AWS vs DOMagento performance comparison - AWS vs DO
Magento performance comparison - AWS vs DO
 
Kubernetes and Terraform in the Cloud: How RightScale Does DevOps
Kubernetes and Terraform in the Cloud: How RightScale Does DevOpsKubernetes and Terraform in the Cloud: How RightScale Does DevOps
Kubernetes and Terraform in the Cloud: How RightScale Does DevOps
 
Level up your SQL and Azure, by using Rubrik
Level up your SQL and Azure, by using RubrikLevel up your SQL and Azure, by using Rubrik
Level up your SQL and Azure, by using Rubrik
 
Orchestrating PaaS and IaaS+ with RightScale
Orchestrating PaaS and IaaS+ with RightScaleOrchestrating PaaS and IaaS+ with RightScale
Orchestrating PaaS and IaaS+ with RightScale
 
Managing Container-as-a-Service and Docker Clusters in the Cloud with RightScale
Managing Container-as-a-Service and Docker Clusters in the Cloud with RightScaleManaging Container-as-a-Service and Docker Clusters in the Cloud with RightScale
Managing Container-as-a-Service and Docker Clusters in the Cloud with RightScale
 
7 Common Questions About a Cloud Management Platform
7 Common Questions About a Cloud Management Platform7 Common Questions About a Cloud Management Platform
7 Common Questions About a Cloud Management Platform
 
10 Good Reasons: NetApp Ransomware Protection
10 Good Reasons: NetApp Ransomware Protection10 Good Reasons: NetApp Ransomware Protection
10 Good Reasons: NetApp Ransomware Protection
 
Building Hybrid Cloud Architectures with GigaSpaces XAP
Building Hybrid Cloud Architectures with GigaSpaces XAPBuilding Hybrid Cloud Architectures with GigaSpaces XAP
Building Hybrid Cloud Architectures with GigaSpaces XAP
 
Intro to Cloudify
Intro to CloudifyIntro to Cloudify
Intro to Cloudify
 
Move Out of the Data Center to Reach More Customers
Move Out of the Data Center to Reach More CustomersMove Out of the Data Center to Reach More Customers
Move Out of the Data Center to Reach More Customers
 
BlueData Integration with Cloudera Manager
BlueData Integration with Cloudera ManagerBlueData Integration with Cloudera Manager
BlueData Integration with Cloudera Manager
 
Tagging Best Practices for Cloud Governance
Tagging Best Practices for Cloud GovernanceTagging Best Practices for Cloud Governance
Tagging Best Practices for Cloud Governance
 
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
 
AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...
AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...
AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...
 
AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...
AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...
AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...
 
Manage and Optimize Cloud Spend with RightScale Optima
Manage and Optimize Cloud Spend with RightScale OptimaManage and Optimize Cloud Spend with RightScale Optima
Manage and Optimize Cloud Spend with RightScale Optima
 
Modern Scheduling for Modern Applications with Nomad
Modern Scheduling for Modern Applications with NomadModern Scheduling for Modern Applications with Nomad
Modern Scheduling for Modern Applications with Nomad
 

Similar to Disaster Recovery on Demand on the Cloud

Avoiding Cloud Outage
Avoiding Cloud OutageAvoiding Cloud Outage
Avoiding Cloud OutageNati Shalom
 
Backup on the cloud 10.1.13
Backup on the cloud 10.1.13Backup on the cloud 10.1.13
Backup on the cloud 10.1.132nd Watch
 
Cloudifying High Availability: The Case for Elastic Disaster Recovery
Cloudifying High Availability: The Case for Elastic Disaster RecoveryCloudifying High Availability: The Case for Elastic Disaster Recovery
Cloudifying High Availability: The Case for Elastic Disaster RecoveryAli Hodroj
 
Strategies for Seamless Backup and Disaster Recovery with AWS
Strategies for Seamless Backup and Disaster Recovery with AWSStrategies for Seamless Backup and Disaster Recovery with AWS
Strategies for Seamless Backup and Disaster Recovery with AWSAmazon Web Services
 
Running SQL Server on AWS | John McCormack | DataGrillen 2019
Running SQL Server on AWS | John McCormack | DataGrillen 2019Running SQL Server on AWS | John McCormack | DataGrillen 2019
Running SQL Server on AWS | John McCormack | DataGrillen 2019John McCormack
 
AWS re:Invent 2016: Reinventing Disaster Recovery Leveraging AWS Cloud Infras...
AWS re:Invent 2016: Reinventing Disaster Recovery Leveraging AWS Cloud Infras...AWS re:Invent 2016: Reinventing Disaster Recovery Leveraging AWS Cloud Infras...
AWS re:Invent 2016: Reinventing Disaster Recovery Leveraging AWS Cloud Infras...Amazon Web Services
 
Agile Infrastructure with Windows Azure
Agile Infrastructure with Windows AzureAgile Infrastructure with Windows Azure
Agile Infrastructure with Windows AzureHARMAN Services
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
4 C’s for Using Cloud to Support Scientific Research
4 C’s for Using Cloud to Support Scientific Research4 C’s for Using Cloud to Support Scientific Research
4 C’s for Using Cloud to Support Scientific ResearchAvere Systems
 
Deploying MediaWiki On IBM DB2 in The Cloud Presentation
Deploying MediaWiki On IBM DB2 in The Cloud PresentationDeploying MediaWiki On IBM DB2 in The Cloud Presentation
Deploying MediaWiki On IBM DB2 in The Cloud PresentationLeons Petražickis
 
Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...
Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...
Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...Amazon Web Services
 
Disaster recovery sites on AWS: minimal costs maximum efficiency
Disaster recovery sites on AWS: minimal costs maximum efficiencyDisaster recovery sites on AWS: minimal costs maximum efficiency
Disaster recovery sites on AWS: minimal costs maximum efficiencyAmazon Web Services
 
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...Amazon Web Services
 
AWS Summit Benelux 2013 - Enterprise Applications on AWS
AWS Summit Benelux 2013 - Enterprise Applications on AWSAWS Summit Benelux 2013 - Enterprise Applications on AWS
AWS Summit Benelux 2013 - Enterprise Applications on AWSAmazon Web Services
 
데이터 마이그레이션 AWS와 같이하기 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
데이터 마이그레이션 AWS와 같이하기 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming데이터 마이그레이션 AWS와 같이하기 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
데이터 마이그레이션 AWS와 같이하기 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 GamingAmazon Web Services Korea
 
The DevOps PaaS Infusion - May meetup
The DevOps PaaS Infusion - May meetupThe DevOps PaaS Infusion - May meetup
The DevOps PaaS Infusion - May meetupNorm Leitman
 
Containerized Hadoop beyond Kubernetes
Containerized Hadoop beyond KubernetesContainerized Hadoop beyond Kubernetes
Containerized Hadoop beyond KubernetesDataWorks Summit
 
Cloud computing
Cloud computingCloud computing
Cloud computinggd1410
 
AWS Repatriation: Bring Your Apps Back
AWS Repatriation: Bring Your Apps BackAWS Repatriation: Bring Your Apps Back
AWS Repatriation: Bring Your Apps BackRandy Bias
 

Similar to Disaster Recovery on Demand on the Cloud (20)

Avoiding Cloud Outage
Avoiding Cloud OutageAvoiding Cloud Outage
Avoiding Cloud Outage
 
Backup on the cloud 10.1.13
Backup on the cloud 10.1.13Backup on the cloud 10.1.13
Backup on the cloud 10.1.13
 
Cloudifying High Availability: The Case for Elastic Disaster Recovery
Cloudifying High Availability: The Case for Elastic Disaster RecoveryCloudifying High Availability: The Case for Elastic Disaster Recovery
Cloudifying High Availability: The Case for Elastic Disaster Recovery
 
Strategies for Seamless Backup and Disaster Recovery with AWS
Strategies for Seamless Backup and Disaster Recovery with AWSStrategies for Seamless Backup and Disaster Recovery with AWS
Strategies for Seamless Backup and Disaster Recovery with AWS
 
Running SQL Server on AWS | John McCormack | DataGrillen 2019
Running SQL Server on AWS | John McCormack | DataGrillen 2019Running SQL Server on AWS | John McCormack | DataGrillen 2019
Running SQL Server on AWS | John McCormack | DataGrillen 2019
 
AWS re:Invent 2016: Reinventing Disaster Recovery Leveraging AWS Cloud Infras...
AWS re:Invent 2016: Reinventing Disaster Recovery Leveraging AWS Cloud Infras...AWS re:Invent 2016: Reinventing Disaster Recovery Leveraging AWS Cloud Infras...
AWS re:Invent 2016: Reinventing Disaster Recovery Leveraging AWS Cloud Infras...
 
Agile Infrastructure with Windows Azure
Agile Infrastructure with Windows AzureAgile Infrastructure with Windows Azure
Agile Infrastructure with Windows Azure
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
4 C’s for Using Cloud to Support Scientific Research
4 C’s for Using Cloud to Support Scientific Research4 C’s for Using Cloud to Support Scientific Research
4 C’s for Using Cloud to Support Scientific Research
 
Deploying MediaWiki On IBM DB2 in The Cloud Presentation
Deploying MediaWiki On IBM DB2 in The Cloud PresentationDeploying MediaWiki On IBM DB2 in The Cloud Presentation
Deploying MediaWiki On IBM DB2 in The Cloud Presentation
 
Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...
Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...
Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...
 
Disaster recovery sites on AWS: minimal costs maximum efficiency
Disaster recovery sites on AWS: minimal costs maximum efficiencyDisaster recovery sites on AWS: minimal costs maximum efficiency
Disaster recovery sites on AWS: minimal costs maximum efficiency
 
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
 
AWS Summit Benelux 2013 - Enterprise Applications on AWS
AWS Summit Benelux 2013 - Enterprise Applications on AWSAWS Summit Benelux 2013 - Enterprise Applications on AWS
AWS Summit Benelux 2013 - Enterprise Applications on AWS
 
데이터 마이그레이션 AWS와 같이하기 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
데이터 마이그레이션 AWS와 같이하기 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming데이터 마이그레이션 AWS와 같이하기 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
데이터 마이그레이션 AWS와 같이하기 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
 
The DevOps PaaS Infusion - May meetup
The DevOps PaaS Infusion - May meetupThe DevOps PaaS Infusion - May meetup
The DevOps PaaS Infusion - May meetup
 
Containerized Hadoop beyond Kubernetes
Containerized Hadoop beyond KubernetesContainerized Hadoop beyond Kubernetes
Containerized Hadoop beyond Kubernetes
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
AWS Repatriation: Bring Your Apps Back
AWS Repatriation: Bring Your Apps BackAWS Repatriation: Bring Your Apps Back
AWS Repatriation: Bring Your Apps Back
 
Self-Service Supercomputing
Self-Service SupercomputingSelf-Service Supercomputing
Self-Service Supercomputing
 

More from Nati Shalom

Cloudify and terraform integration
Cloudify and terraform integrationCloudify and terraform integration
Cloudify and terraform integrationNati Shalom
 
Why NFV and Digital Transformation Projects Fail!
Why NFV and Digital Transformation Projects Fail! Why NFV and Digital Transformation Projects Fail!
Why NFV and Digital Transformation Projects Fail! Nati Shalom
 
Cloudify and terraform integration
Cloudify and terraform integrationCloudify and terraform integration
Cloudify and terraform integrationNati Shalom
 
1 cloud, 2 clouds, 3 clouds, tons...
1 cloud, 2 clouds, 3 clouds, tons...1 cloud, 2 clouds, 3 clouds, tons...
1 cloud, 2 clouds, 3 clouds, tons...Nati Shalom
 
Open Stack Days israel Keynote 2017
Open Stack Days israel Keynote 2017Open Stack Days israel Keynote 2017
Open Stack Days israel Keynote 2017Nati Shalom
 
What A No Compromises Hybrid Cloud Looks Like
What A No Compromises Hybrid Cloud Looks Like What A No Compromises Hybrid Cloud Looks Like
What A No Compromises Hybrid Cloud Looks Like Nati Shalom
 
Running OpenStack in Production
Running OpenStack in Production Running OpenStack in Production
Running OpenStack in Production Nati Shalom
 
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...Nati Shalom
 
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStack
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStackReal World Example of Orchestrating Docker, Node JS, NFV on OpenStack
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStackNati Shalom
 
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Nati Shalom
 
OpenStack Juno The Complete Lowdown and Tales from the Summit
OpenStack Juno The Complete Lowdown and Tales from the SummitOpenStack Juno The Complete Lowdown and Tales from the Summit
OpenStack Juno The Complete Lowdown and Tales from the SummitNati Shalom
 
Application and Network Orchestration using Heat & Tosca
Application and Network Orchestration using Heat & ToscaApplication and Network Orchestration using Heat & Tosca
Application and Network Orchestration using Heat & ToscaNati Shalom
 
Introduction to Cloudify for OpenStack users
Introduction to Cloudify for OpenStack users Introduction to Cloudify for OpenStack users
Introduction to Cloudify for OpenStack users Nati Shalom
 
Software Defined Operator
Software Defined OperatorSoftware Defined Operator
Software Defined OperatorNati Shalom
 
Is Orchestration the Next Big Thing in DevOps
Is Orchestration the Next Big Thing in DevOpsIs Orchestration the Next Big Thing in DevOps
Is Orchestration the Next Big Thing in DevOpsNati Shalom
 
When networks meets apps (open stack atlanta)
When networks meets apps (open stack atlanta)When networks meets apps (open stack atlanta)
When networks meets apps (open stack atlanta)Nati Shalom
 
Application Centric DevOps
Application Centric DevOpsApplication Centric DevOps
Application Centric DevOpsNati Shalom
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormNati Shalom
 
Disaster recovery on demand on the cloud
Disaster recovery on demand on the cloudDisaster recovery on demand on the cloud
Disaster recovery on demand on the cloudNati Shalom
 
Dont call me cache april 17
Dont call me cache april 17Dont call me cache april 17
Dont call me cache april 17Nati Shalom
 

More from Nati Shalom (20)

Cloudify and terraform integration
Cloudify and terraform integrationCloudify and terraform integration
Cloudify and terraform integration
 
Why NFV and Digital Transformation Projects Fail!
Why NFV and Digital Transformation Projects Fail! Why NFV and Digital Transformation Projects Fail!
Why NFV and Digital Transformation Projects Fail!
 
Cloudify and terraform integration
Cloudify and terraform integrationCloudify and terraform integration
Cloudify and terraform integration
 
1 cloud, 2 clouds, 3 clouds, tons...
1 cloud, 2 clouds, 3 clouds, tons...1 cloud, 2 clouds, 3 clouds, tons...
1 cloud, 2 clouds, 3 clouds, tons...
 
Open Stack Days israel Keynote 2017
Open Stack Days israel Keynote 2017Open Stack Days israel Keynote 2017
Open Stack Days israel Keynote 2017
 
What A No Compromises Hybrid Cloud Looks Like
What A No Compromises Hybrid Cloud Looks Like What A No Compromises Hybrid Cloud Looks Like
What A No Compromises Hybrid Cloud Looks Like
 
Running OpenStack in Production
Running OpenStack in Production Running OpenStack in Production
Running OpenStack in Production
 
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
 
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStack
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStackReal World Example of Orchestrating Docker, Node JS, NFV on OpenStack
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStack
 
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
 
OpenStack Juno The Complete Lowdown and Tales from the Summit
OpenStack Juno The Complete Lowdown and Tales from the SummitOpenStack Juno The Complete Lowdown and Tales from the Summit
OpenStack Juno The Complete Lowdown and Tales from the Summit
 
Application and Network Orchestration using Heat & Tosca
Application and Network Orchestration using Heat & ToscaApplication and Network Orchestration using Heat & Tosca
Application and Network Orchestration using Heat & Tosca
 
Introduction to Cloudify for OpenStack users
Introduction to Cloudify for OpenStack users Introduction to Cloudify for OpenStack users
Introduction to Cloudify for OpenStack users
 
Software Defined Operator
Software Defined OperatorSoftware Defined Operator
Software Defined Operator
 
Is Orchestration the Next Big Thing in DevOps
Is Orchestration the Next Big Thing in DevOpsIs Orchestration the Next Big Thing in DevOps
Is Orchestration the Next Big Thing in DevOps
 
When networks meets apps (open stack atlanta)
When networks meets apps (open stack atlanta)When networks meets apps (open stack atlanta)
When networks meets apps (open stack atlanta)
 
Application Centric DevOps
Application Centric DevOpsApplication Centric DevOps
Application Centric DevOps
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using Storm
 
Disaster recovery on demand on the cloud
Disaster recovery on demand on the cloudDisaster recovery on demand on the cloud
Disaster recovery on demand on the cloud
 
Dont call me cache april 17
Dont call me cache april 17Dont call me cache april 17
Dont call me cache april 17
 

Recently uploaded

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Disaster Recovery on Demand on the Cloud

  • 1. Protect your app from Outages Nati Shalom CTO GigaSpaces @natishalom May 2013
  • 2.  AWS and outages  Outage impact  Disaster Recovery – it’s all about redundancy!  Cloudify as a solution for redundancy  Demo with Cloudify on EC2 ® Copyright 2013 GigaSpaces Ltd. All Rights Reserved2 AGENDA
  • 3. 3 AWS USAGE • AWS – around 0.5M servers • Facebook – less than 0.1M servers • Google – around 1M servers
  • 5. OUTAGE – APRIL 21, 2011 ® Copyright 2012 GigaSpaces Ltd. All Rights Reserved5
  • 6. OUTAGE - JUNE 29, 2012 ® Copyright 2012 GigaSpaces Ltd. All Rights Reserved6
  • 7. OUTAGE - OCTOBER 22, 2012 ® Copyright 2012 GigaSpaces Ltd. All Rights Reserved7
  • 8. OUTAGE - CHRISTMAS EVE 2012 ® Copyright 2012 GigaSpaces Ltd. All Rights Reserved8
  • 9. NOT ONLY AMAZON ® Copyright 2012 GigaSpaces Ltd. All Rights Reserved9  28 December 2012 - some owners of Microsoft's XBox 360 gaming console were unable to access some of their cloud-based storage files.  26 July 2012 - Service for Microsoft’s Windows Azure Europe region went down for more than two hours  29 February 2012 - The ultimate result was service impacts of 8-10 hours for users of Azure data centers in Dublin, Ireland, Chicago, and San Antonio.
  • 10. 10 THAT’S WHAT YOU EXPECT? 99% - 3.65 days downtime 99.9% - 8.76 hours downtime 99.99% - 53 minutes downtime 99.999% - 5.26 minutes downtime
  • 11. ® Copyright 2012 GigaSpaces Ltd. All Rights Reserved11 OUTAGE IMPACT – DESIGN FOR FAILURES Outage could cost… $89K per hour for Amadeus $225K per hour for PayPal!
  • 14. 14 PREPARE FOR DISASTER RECOVERY •Dedicated expert for DR architecture •Define target recovery time & point •Assume every tier can fail •Use monitoring and alerts •Document your operational processes
  • 16. 16
  • 19. 19
  • 20. Leverage Existing Automation Frameworks Configuration Centric APP Centric (PaaS)
  • 21. CLONE YOUR ENV - HOW DOES IT WORK?
  • 22. BUILT IN SUPPORT FOR MANAGING DATA IN THE CLOUD Real Time Relational DB Clusters NoSQL Clusters Hadoop Storm MySQL MongoDB Hadoop (Hive, Pig,..) Elastic Caching XAP Postgress Cassandra ZooKeeper Couchbase ElasticSearch
  • 23. 23
  • 24.  Technology-based concrete process control and information service  Deployments across North America, Latin America, Asia, and Europe for nearly a decade  Part of W.R. Grace & Co , $6.3 B Company.  The problem: On-Demand HA/DR over multiple Cloud regions. CASE STUDY: VERIFI 24 High Availability Data Replication Disaster Recovery
  • 25. ELASTIC ON-DEMAND DISASTER RECOVERY 25  Problem  Can we eliminate the RTO vs. Cost trade-off in the cloud?  Solution (Elastic DR)  A hybrid between Hot and Warm DR  Switch to Active site in matter of seconds through cloud- agnostic lifecycle automation recipes
  • 26. VERIFI (INITIAL) ARCHITECTURE 26 Availability region (US-West: Oregon) Data Volume Internet EC2 Instance mod_cluster EC2 Instance JBoss Data Volume EC2 Instance EC2 Instance PostgresSQL Cassandra 4 recipes
  • 27. ELASTIC DR ON-DEMAND: FAILOVER SCENARIO 27 Region (US-West Oregon) App Servers PostgresSQL Region (US-East Virginia) PostgresSQL Cloud #1 Cloud #2 Region (US-East Virginia ) PostgresSQL Cloud #1 Cloud #2 App Servers Region (US-West California) PostgresSQL Cloud #3 Region failure occurs * Initially, all those actions may be done manually by Verifi’s Ops team (e.g.: via recipe commands in CLI) Bootstrap another cloud in a different region using the same application recipe used to bootstrap cloud #2 above* Liveness poll Liveness poll Upon initial deployment, the primary deplyoment of the application “verifi” will be bootstrapped onto cloud #1, another slightly modified application recipe “verifi_dr” will be bootstrapped as cloud #2, polling cloud #1 for failure, and acting as a PostgresSQL db slave. Turn Postgres slave into master, Start app server instances*
  • 28. FAILOVER SCENARIO 28 Region (US-West Oregon) App Servers PostgresSQL Region (US-East Virginia) PostgresSQL Cloud #1 Cloud #2 Region (US-East Virginia ) PostgresSQL Cloud #1 Cloud #2 App Servers Region (US-West California) PostgresSQL Cloud #3 Region failure occurs Bootstrap another cloud in a different region using the same application recipe used to bootstrap cloud #2 above* Liveness poll Liveness poll Upon initial deployment, the primary deployment of the application will be bootstrapped onto cloud #1, another slightly modified application recipe will be bootstrapped as cloud #2, polling cloud #1 for failure, and acting as a PostgresSQL db slave. Turn Postgres slave into master, Start app server instances*
  • 29. Copyright 2012 Gigaspaces. All Rights Reserved29 NEXT STEPS Across clouds (AWS, Rackspace, Azure…etc) Across AWS regions Across AWS zones 1 application + overrides Several cloud drivers 1 application + overrides 1 cloud driver 1 application + overrides 1 cloud driver Availability Supported by Verifi phase #1
  • 30. Copyright 2013 Gigaspaces. All Rights Reserved30 ELASTIC ON-DEMAND DR: COSTS Main Site (US-West) Warm DR Site (US-East) Hot DR Site Cost $82,068 $12,625 $82,068  Main Site  1 Load balancer, 2 JBoss instances, 1 PostgreSQL master, 3 Cassandra  DR Site  1 PostgreSQL slave – All other instance start on demand upon failover
  • 31. Copyright 2013 Gigaspaces. All Rights Reserved31 ELASTIC DR: WARM DR COST, CLOUD PORTABILITY 4 recipes DR Site $12k SameRecipe $14k $6k $5k $9k
  • 32. Copyright 2013 Gigaspaces. All Rights Reserved32 ELASTIC DR: HOT DR COST 4 recipes DR Site $82k SameRecipe $79k $115k $68k $91k
  • 33.  Disaster Recovery – it’s all about redundancy!  Cloning your environment – app stack  Cloning your Data – DB Replication  Automation makes DR processes simple  Use recipes to clone your app stack consistently  Use replication to clone your data  Leverage cloud economics to reduce the cost  DR on Demand  Multi Cloud ® Copyright 2013 GigaSpaces Ltd. All Rights Reserved33 SUMMARY
  • 34. Thank You! @natishalom ® Copyright 2013 GigaSpaces Ltd. All Rights Reserved34 QUESTIONS & ANSWERS

Editor's Notes

  1. A high-ranking Amazon executive said there are 60,000 different customers across the various Amazon Web Services, and most of them are not the startups that are normally associated with on-demand computing. Rather the biggest customers in both number and amount of computing resources consumed are divisions of banks, pharmaceuticals companies and other large corporations who try AWS once for a temporary project, and then get hooked. According to Statspotting.com in March 2012 - researcher estimates that Amazon Web Services is using at least 454,400 servers in seven data center hubs around the globe. Let us try this: Google is powered by a million servers. Maybe a little more than that. And Amazon has half a million servers. Now, things fall in place. Facebook, the service that takes up one fourth of all our time online, is powered by less than 100,000 servers.Biggest customers – pinterest, instagram, Netflix, heroku, quora, foursquare etcAmazon Web Services runs more than 835,000 requests per second for hundreds of thousands of customers in 190 countries, including 300 government agencies and 1,500 educational institutions. 
  2. The Amazon cloud proved itself in that sufficient resources were available world-wide such that many well-prepared users could continue operating with relatively little downtime. But because Amazon’s reliability has been incredible, many users were not well-prepared leading to widespread outages.Amazon EC2 outage on April 2011 was the worst in cloud computing’s history back then. It made the front page of many news pages, including the New York Times, probably because many people were shocked by how many web sites and services rely on EC2.Microsoft Azure outageDec 28 2012 -  some owners of Microsoft's Xbox 360 game console were unable to access some of their cloud-based save storage files.July 26 - 2012 - Service for Microsoft’s Windows Azure Europe region went down for more than two hoursFeb 29 2012 - The ultimate result was service impacts of 8-10 hours for users of Azure data centers in Dublin, Ireland, Chicago, and San Antonio.
  3. Some parts of Amazon Web Services suffered a major outage. A portion of volumes utilizing the Elastic Block Store (EBS) service became "stuck" and were unable to fulfill read/write requests. It took at least two days for service to be fully restored. Reddit, one of the better-known sites to go down due to the error, said it has 700 EBS volumes with Amazon.Sites like Quora and Reddit were able to come back online in "read-only" mode, but users couldn't post new content for many hours.
  4. For second time in less than a month, Amazon’s Northern Virginia data center has suffered an outage and is impacting many popular services such as Instagram, Pinterest & Netflix.Several websites that rely on Amazon Web Services were taken offline due to a severe storm of historic proportions in the Northern Virginia area where Amazon's largest datacenter is located. Amazon previously suffered an outage in its Northern Virginia facilities on June 14, 2012.A line of severe storms packing winds of up to 80 mph has caused extensive damage and power outages in Virginia. Dominion Virginia Power crews are assessing damages and will be restoring power where safe to do so.
  5. A major outage occurred, affecting many sites such as reddit, Foursquare, Pinterest, and others. The cause was a latent bug in an operational data collection agent. A memory leak and a failed monitoring system caused the Amazon Web Services outage on Monday that took out Reddit and other major services.According to a post Friday night, AWS explained that the problem arose after a simple replacement of a data collection server. After installation, the server did not propagate its DNS address correctly and so a fraction of servers did not get the message. Those servers kept trying to reach the server, which led to a memory leak that then went out of control due to the failure of an internal monitoring alarm. Eventually the system ground to a virtual stop and millions of customers felt the pain.
  6. Amazon AWS again suffered an outage, causing websites such as Netflix instant video to be unavailable for some customers, particularly in the North-eastern US. Amazon later issued a statement detailing the issues with the Elastic Load Balancing service that led up to the outage.The disruption began shortly after noon Pacific time on December 24 when data was accidentally deleted by a developer during maintenance on the East Coast Elastic Load Balancing system, which is designed to distribute traffic volume among servers."Netflix is designed to handle failure of all or part of a single availability zone in a region as we run across three zones and operate with no loss of functionality on two," the company said in ablog post this afternoon. "We are working on ways of extending our resiliency to handle partial or complete regional outages."
  7. Fault tolerant systems are measured by their uptime / downtime for end usersAmazon says it is "committed" to a 99.95 percent uptime
  8. Although AWS went offline for a few hours only, the downtime experience did have an impact on customers’ businesses. There is no known data for the number of people affected by a cloud computing service outage. It is estimated that the travel service provider Amadeus loses $89,000 per hour during any cloud computing outage, while Paypal loses around $225,000 per hour.
  9. DR – The process and procedures you take to restore your system after catastrophic event.Cloud infrastructure has made DR much easier and affordable comparing to previous options.Cloud can also suffer from large scale failures because of network, power or any IT failures.Applications owners need to be responsible for HA and DR – can use multiple servers, AZ, regions and even clouds.Zones within a region share a LAN so they have high bandwidth, low latency and private IP access. Zones utilize separate power resources. Regions are “islands” – they share no resources.
  10. Each cloud is unique in many aspects offering different API and functionality to manage the resources.Different set of available resourcesDifferent format, encoding and versionsDifferent security groups, machine images, snapshots etc.
  11. Make sure to have a dedicated expert to manage your DR architecture, processes and testing.Define what your target recovery time and recovery point is.Be pessimistic and design for failures – (assume everything will fail and design a solution that is capable of handling it). Avoid single point of failures – all parts of your app should be highly available (different AZ / regions / cloud) – load balancers, app servers, web servers, message bus, database.Use monitoring and alerts for failover processes and for every change in state.Document your DR operational processes and automations.Try to “break” different part in your application. Try different ways to break it – unplug the network, turn machine off etc. Try it again.
  12. Netflix has open sourced ”Chaos Monkey,” its tool designed to purposely cause failure in order to increase the resiliency of an application in Amazon Web Services (AWS.)It’s a timely move as AWS has had its fair share of outages. With tools like Chaos Monkey, companies can be better prepared when a cloud infrastructure has a failure.In a blog post, Netflx says that this is the first of several tools that it will open source to help companies better manage the services they run in cloud infrastructures. Next up is likely to be Janitor Monkey which helps keep an environment tidy and costs down.Chaos Monkey has achieved its own fame for its innovative approach. According to Netflix, the tool “randomly disables production instances to make sure it can survive common types of failure without any customer impact. The name comes from the idea of unleashing a wild monkey with a weapon in your data center (or cloud region) to randomly shoot down instances and chew through cables — all the while we continue serving our customers without interruption.”
  13. Netflix provides an excellent toolset for surving outages at the operation level.In this part i wanted to zoom-in more on the design implication of our application.The core principle for surviving failure is actually fairly simple and in fact applies to any systems not just cloud whether they happen to be Airplane, Missiles, Cars etc.. At the end its all about redundancy. The degree of tolerance is often determined by how many alternate systems or parts of the system we have in our design and how much they are separated from one another. The degree tolerance is also determined by how fast we can detect the broken part in our system and make the switch. In software terms the common parts that comprises our system is built out of two main groups - the business logic and the data.Making a redundant software application that can survive failure is often based on setting up clones for two of those  parts of our system.
  14. We need abstraction – we don’t want to be locked in. We want to use tools that offer this abstraction layers both for daily management and for DR. This tool should translate our architecture concepts to the cloud specific properties (using recipes).To clone our application business logic we need to be able to ensure that all parts of our system runs the exact same version of all our software components . That include not just the binaries but also the configuration, the scripts that runs our application and more importantly that all our post deployment procedures such as fail-over, scaling and monitoring are also kept consistent. Quite often the things that makes the cloning of our business logic complex is due to the fact that the information on how to run our application is often scattered in many different sources such as scripts, as well as the mind of the people that runs those apps. To make the job of cloning our application much simpler and thus more consistent we need to be able to capture all parts of the information for running our apps in the same place. Configuration management tools such as Chef, Puppet and in the case of Amazon CloudFormation can help on this regard. 
  15. RDS read replica - Amazon RDS uses MySQL’s built-in replication functionality to create a special type of DB Instance called a Read Replica that allows you to elastically scale out beyond the capacity constraints of a single DB Instance for read-heavy database workloads. Once you create a Read Replica, database updates on the source DB Instance are replicated to the Read Replica using MySQL’s native, asynchronous replication. Since Read Replicas leverage standard MySQL replication, they may fall behind their sources, and they are therefore not intended to be used for enhancing fault tolerance in the event of source DB Instance failure or Availability Zone failure.
  16. There are lots of patterns on how to avoid failure.It took Netflix lots of development work to build a framework that can handle them well.Most users, startup don't have the luxury of implementing them themselves. You need a tool that will enable you to automate those patterns in a consistent way.  - Enter Cloudify
  17. Any App, Any Stack — Move your application to the cloud without making any code changes, regardless of the application stack (Java/Spring, Java EE, Ruby on Rails, …), database store (relational such as MySQL or non-relational such as Apache Cassandra), or any other middleware components it uses. This enables you to achieve your objective of no code changes.To make the work of setting all this work simpler we tried to bake all those patterns into a readymade tools and are scripted into out of the box recipes. The cloudify recipes includes: Database cluster recipes with support for MySQL, MongoDB, Cassandra, Postgress etc..Integration with Chef and Puppet Automation of fail-over, scaling and continues maintenance of our application.Application recipes that allows you to capture all the aspect of running your application including the post deployment aspect such as fail-over, scaling and monitoring.
  18. There are lots of patterns on how to avoid failure.It took Netflix lots of development work to build a framework that can handle them well.Most users, startup don't have the lactury of implementing them themselves. You need a tool that will enable you to automate those patterns in a consistent way.  - Enter Cloudify
  19. Cloud brings lots of promise for making our business more agile.Cloud has also become a huge shared infrastructure in which every failure has a much more significant impact on our business world wide.The experience in the past year had tought us that even a robust cloud infrastructure such as Amazon can fail. Through this experience we've learned that rather than relying on the infrastructure for preventing failure we need to design our system to cope with failure and get used to failure as away of life. Having said that the investment required to build a robust application can be fairly large and not something that everyone can afford.Using tools like Cloudify, Chef Puppet and if your a pure Amazon shop Netflix <framework> could help greatly to reduce this effort by making a lot of those patterns pre-backed into recipes.