SlideShare une entreprise Scribd logo
1  sur  58
Télécharger pour lire hors ligne
Disaster recovery with OpenNebula
Carlo Daffara
First, let me get
some coffee.
“Disaster recovery (DR) involves a set of policies and
procedures to enable the recovery or continuation of vital
technology infrastructure and systems following a natural
or human-induced disaster. Disaster recovery focuses on
the IT or technology systems supporting critical business
functions, as opposed to business continuity, which
involves keeping all essential aspects of a business
functioning despite significant disruptive events. Disaster
recovery is therefore a subset of business continuity.”
80% of businesses affected by a major
incident either never re-open or close
within 18 months (Source: Axa)
From “Understanding the Cost of Data Center Downtime: An Analysis of the Financial Impact on Infrastructure Vulnerability”, Ponemon Research
“Let’s begin with one very interesting fact. According to a
survey completed in 2010, human error is responsible for
40% of all data loss, as compared to just 29% for hardware
or system failures. An earlier IBM study determined data
loss due to human error was as high as 80%” (From:
Business continuity and disaster recovery planning for IT
professionals”, Elsevier press, 2014)
The recovery time objective (RTO) is the targeted duration of
time and a service level within which a business process must
be restored after a disaster (or disruption) in order to avoid
unacceptable consequences associated with a break in
business continuity.
The recovery point objective (RPO), is the maximum tolerable
period in which data might be lost from an IT service due to a
major incident.
“Alternative storage-based replication solutions cost a
minimum of $10,000 per terabyte of data covered plus
ongoing maintenance. For the composite organization’s
225 protected VMs with an average size of 100 gigabytes
(GB), the three year costs for licenses and maintenance are
estimated at $328,500” (Forrester research, “The Total
Economic Impact of VMware vCenter Site Recovery
Manager”, 2013)
3 simple rules to make a working DR:
Rule 1: never put all eggs in one
basket (be it hardware, software, cloud)
Customer buys full DR and snapshot capability from local
data center; data center updates SAN firmware and loses
everything. Customer discovers that snapshots and
backups were kept in the same SAN with everything else.
In electronics, an opto-isolator, also called an optocoupler,
photocoupler, or optical isolator, is a component that transfers
electrical signals between two isolated circuits by using light.
Opto-isolators prevent high voltages from affecting the system
receiving the signal.
Rule 2: RTO and RPO are usually
different from VM to VM
Needs to be
replicated
constantly
No one cares
if this dies
Rule 3: design a reliable oracle
Oracle of
Delphi
How the others do it:
How we do it:
Our approach takes advantage of three
individual factors:
● LizardFS’ thinly-provisioned snapshots
● online replication of chunks & tiering
● OpenNebula’s datastores
# An example of configuration of goals. It contains the default values.
1 1 : _
2 2 : _ _
3 3 : _ _ _
4 4 : _ _ _ _
5 5 : _ _ _ _ _
# (...)
20 20 : _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
# But you don't have to specify all of them -- defaults will be assumed.
# You can define your own custom goals using labels if you use them, e.g.:
# 14 min_two_locations: _ locationA locationB # one copy in A, one in B, third anywhere
# 15 fast_access : ssd _ _ # one copy on ssd, two additional on any
drives
# 16 two_manufacturers: WD HT # one on WD disk, one on HT disk
● Most disasters are “local”, for example a fire
in the server room or a flood
● Two different DR sites, one near (eg. next
building/other side of the building) and one
far (external datacenter)
● near DR receives a copy of the chunks that
are part of the marked datastores
● Remote snapshots are handled in the same
way: we take a full snapshot of the
datastore, and differentially replicate it
● We use the “snapshot of snapshot” approach
to avoid the cost of deduplication
● This way we can prioritize sync queues, and
in the receiving end we got a complete and
decoupled + working OpenNebula
For example, average dedup cost for ZFS: 5 to 30 GB of dedup table data for every TB of pool data, assuming an average block size of 64K.
/var/lib/one/datastore
↓
DRSNAP12H
/var/lib/one/snapshots
↓
<yyyymmddhh>
↓
DRSNAP12H
Local
VM changes only in
snapshots
/var/lib/one/datastore
↓
DRSNAP12H
/var/lib/one/snapshots
↓
<yyyymmddhh>
↓
DRSNAP12H
Remote
no chunk changes
in snapshots
inplace rsync
(25x speedup)
virsh# domblkstat instance-0012 --device vda
vda rd_req 128
vda rd_bytes 2344448
vda wr_req 234
vda wr_bytes 618496
vda flush_operations 2
vda rd_total_times 106512819
vda wr_total_times 960359872
vda flush_total_times 1741727
Our “pilot light” approach: a running OpenNebula on two
nodes, with its own LizardFS store. Running only two VMs: the
Oracle and the Tester
The Oracle checks if DR is needed, and may need a human
confirmation for execution of the DR failover. If confirmation
is given, it takes the latest valid snapshotted datastore,
softlinks it and import the VMs (through snapshots, so it’s
instantaneous)
The Tester makes a snapshot of the current stable snapshot,
import the VMs and runs them into a separate, non-routed
vnet, then executes a test to see if everything works (workload
dependent), then deletes the intermediate snapshots
Only critical VMs are executed this way, if RTO<30 mins
For the VMs with higher RTO, buy one week of hardware on
demand, auto-install a node with Puppet or Ansible, and make
it join the OpenNebula cloud
Deployed usually in 30 mins. Other vendor guarantee <15 minutes.
Ideal for harsh indoor environments that
require protection from falling dirt or liquid,
dust, light splashing, oil or coolant seepage.
Its NEMA Zone 4 rating also makes it perfect
for facilities located in earthquake-prone
seismic zones or any environment prone to
extreme vibration such as factories, power
stations, construction areas, shipping
facilities, warehouses, processing plants,
railroads, airports and military installations.
● Have a “big red button” to stop DR if
needed. Sometimes you are already fighting
fire here, and you know it’s better not to
move everything in flight.
● Have two people that are competent as DR
firefighters, and give them a second phone
with a rechargeable card. And make sure
both don’t go on vacation together. (Hint:
don’t choose two married people)
● Use a gateway machine to provide a
consistent internal IP scheme, and two
different configurations for the gateway
router to provide unmodified routing for the
remaining VMs
● Aggregate functionality in a single VM (for
example, one that manages logs) to
optimize writes
● I favor consistency, so I tend to avoid
application-level replication, unless it’s
native to the app (eg. NoSQL). Otherwise
you have different solutions for different
machines (eg. quorum group in MS
replication with same UUID…)
● Try to reduce write amplification for
databases, especially MySQL. Eg. TokuDB
and its fractal tree
Thank you!
Carlo Daffara
@cdaffara
linkedin.com/in/cdaffara

Contenu connexe

Tendances

Resisting to The Shocks
Resisting to The ShocksResisting to The Shocks
Resisting to The ShocksStefano Fago
 
Rtos concepts
Rtos conceptsRtos concepts
Rtos conceptsanishgoel
 
Webinar: Eliminate Backups and Simplify DR with Hybrid Cloud Storage
Webinar: Eliminate Backups and Simplify DR with Hybrid Cloud StorageWebinar: Eliminate Backups and Simplify DR with Hybrid Cloud Storage
Webinar: Eliminate Backups and Simplify DR with Hybrid Cloud StorageStorage Switzerland
 
Data recovery
Data recoveryData recovery
Data recoverybhaumik_c
 

Tendances (8)

Data Recovery
Data RecoveryData Recovery
Data Recovery
 
Resisting to The Shocks
Resisting to The ShocksResisting to The Shocks
Resisting to The Shocks
 
Rtos concepts
Rtos conceptsRtos concepts
Rtos concepts
 
Webinar: Eliminate Backups and Simplify DR with Hybrid Cloud Storage
Webinar: Eliminate Backups and Simplify DR with Hybrid Cloud StorageWebinar: Eliminate Backups and Simplify DR with Hybrid Cloud Storage
Webinar: Eliminate Backups and Simplify DR with Hybrid Cloud Storage
 
Data recovery
Data recoveryData recovery
Data recovery
 
Real-Time Operating Systems
Real-Time Operating SystemsReal-Time Operating Systems
Real-Time Operating Systems
 
Data recovery
Data recoveryData recovery
Data recovery
 
Real time system tsp
Real time system tspReal time system tsp
Real time system tsp
 

Similaire à Disaster recovery with open nebula

OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real cloud...
OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real cloud...OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real cloud...
OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real cloud...OpenNebula Project
 
OpenNebula Conf 2014 | OpenNebula and MooseFS for disaster recovery: real clo...
OpenNebula Conf 2014 | OpenNebula and MooseFS for disaster recovery: real clo...OpenNebula Conf 2014 | OpenNebula and MooseFS for disaster recovery: real clo...
OpenNebula Conf 2014 | OpenNebula and MooseFS for disaster recovery: real clo...NETWAYS
 
Shielding Data Assets: Exploring Data Protection and Disaster Recovery Strate...
Shielding Data Assets: Exploring Data Protection and Disaster Recovery Strate...Shielding Data Assets: Exploring Data Protection and Disaster Recovery Strate...
Shielding Data Assets: Exploring Data Protection and Disaster Recovery Strate...MaryJWilliams2
 
Ch13 Business Continuity Planning and Procedures
Ch13 Business Continuity Planning and ProceduresCh13 Business Continuity Planning and Procedures
Ch13 Business Continuity Planning and ProceduresInformation Technology
 
Locationless data science on a modern secure edge
Locationless data science on a modern secure edgeLocationless data science on a modern secure edge
Locationless data science on a modern secure edgeJohn Archer
 
Business Continuity Presentation[1]
Business Continuity Presentation[1]Business Continuity Presentation[1]
Business Continuity Presentation[1]jrm1224
 
Disaster recovery glossary
Disaster recovery glossaryDisaster recovery glossary
Disaster recovery glossarysinglehopsn
 
Real Time Operating system (RTOS) - Embedded systems
Real Time Operating system (RTOS) - Embedded systemsReal Time Operating system (RTOS) - Embedded systems
Real Time Operating system (RTOS) - Embedded systemsHariharan Ganesan
 
Brochure triconex emergency_shutdownsystemssolutions_03-10
Brochure triconex emergency_shutdownsystemssolutions_03-10Brochure triconex emergency_shutdownsystemssolutions_03-10
Brochure triconex emergency_shutdownsystemssolutions_03-10Risman BizNet
 
Business Continuity Presentation
Business Continuity PresentationBusiness Continuity Presentation
Business Continuity Presentationperry57123
 
Audax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data ExplosionAudax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data Explosionactifio
 
Disaster Recovery Plan
Disaster Recovery PlanDisaster Recovery Plan
Disaster Recovery PlanDavid Donovan
 
RTOS implementation
RTOS implementationRTOS implementation
RTOS implementationRajan Kumar
 
Joe Graziano – Challenge 2 Design Solution (Part 1)
Joe Graziano – Challenge 2 Design Solution (Part 1)Joe Graziano – Challenge 2 Design Solution (Part 1)
Joe Graziano – Challenge 2 Design Solution (Part 1)tovmug
 
Dataloggers seminar Report
Dataloggers seminar ReportDataloggers seminar Report
Dataloggers seminar ReportNiranjan Kumar
 
[White paper] detecting problems in industrial networks though continuous mon...
[White paper] detecting problems in industrial networks though continuous mon...[White paper] detecting problems in industrial networks though continuous mon...
[White paper] detecting problems in industrial networks though continuous mon...TI Safe
 
Cloud Busting: Understanding Cloud-based Digital Forensics
Cloud Busting: Understanding Cloud-based Digital ForensicsCloud Busting: Understanding Cloud-based Digital Forensics
Cloud Busting: Understanding Cloud-based Digital ForensicsKerry Hazelton
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1blewington
 

Similaire à Disaster recovery with open nebula (20)

OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real cloud...
OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real cloud...OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real cloud...
OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real cloud...
 
OpenNebula Conf 2014 | OpenNebula and MooseFS for disaster recovery: real clo...
OpenNebula Conf 2014 | OpenNebula and MooseFS for disaster recovery: real clo...OpenNebula Conf 2014 | OpenNebula and MooseFS for disaster recovery: real clo...
OpenNebula Conf 2014 | OpenNebula and MooseFS for disaster recovery: real clo...
 
Smartive STORM
Smartive STORMSmartive STORM
Smartive STORM
 
Shielding Data Assets: Exploring Data Protection and Disaster Recovery Strate...
Shielding Data Assets: Exploring Data Protection and Disaster Recovery Strate...Shielding Data Assets: Exploring Data Protection and Disaster Recovery Strate...
Shielding Data Assets: Exploring Data Protection and Disaster Recovery Strate...
 
Ch13 Business Continuity Planning and Procedures
Ch13 Business Continuity Planning and ProceduresCh13 Business Continuity Planning and Procedures
Ch13 Business Continuity Planning and Procedures
 
Locationless data science on a modern secure edge
Locationless data science on a modern secure edgeLocationless data science on a modern secure edge
Locationless data science on a modern secure edge
 
Business Continuity Presentation[1]
Business Continuity Presentation[1]Business Continuity Presentation[1]
Business Continuity Presentation[1]
 
Disaster recovery glossary
Disaster recovery glossaryDisaster recovery glossary
Disaster recovery glossary
 
Real Time Operating system (RTOS) - Embedded systems
Real Time Operating system (RTOS) - Embedded systemsReal Time Operating system (RTOS) - Embedded systems
Real Time Operating system (RTOS) - Embedded systems
 
Brochure triconex emergency_shutdownsystemssolutions_03-10
Brochure triconex emergency_shutdownsystemssolutions_03-10Brochure triconex emergency_shutdownsystemssolutions_03-10
Brochure triconex emergency_shutdownsystemssolutions_03-10
 
Business Continuity Presentation
Business Continuity PresentationBusiness Continuity Presentation
Business Continuity Presentation
 
DATA CENTER
DATA CENTER DATA CENTER
DATA CENTER
 
Audax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data ExplosionAudax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data Explosion
 
Disaster Recovery Plan
Disaster Recovery PlanDisaster Recovery Plan
Disaster Recovery Plan
 
RTOS implementation
RTOS implementationRTOS implementation
RTOS implementation
 
Joe Graziano – Challenge 2 Design Solution (Part 1)
Joe Graziano – Challenge 2 Design Solution (Part 1)Joe Graziano – Challenge 2 Design Solution (Part 1)
Joe Graziano – Challenge 2 Design Solution (Part 1)
 
Dataloggers seminar Report
Dataloggers seminar ReportDataloggers seminar Report
Dataloggers seminar Report
 
[White paper] detecting problems in industrial networks though continuous mon...
[White paper] detecting problems in industrial networks though continuous mon...[White paper] detecting problems in industrial networks though continuous mon...
[White paper] detecting problems in industrial networks though continuous mon...
 
Cloud Busting: Understanding Cloud-based Digital Forensics
Cloud Busting: Understanding Cloud-based Digital ForensicsCloud Busting: Understanding Cloud-based Digital Forensics
Cloud Busting: Understanding Cloud-based Digital Forensics
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1
 

Plus de Carlo Daffara

mindtrek2016 - the economics of open source clouds
mindtrek2016 - the economics of open source cloudsmindtrek2016 - the economics of open source clouds
mindtrek2016 - the economics of open source cloudsCarlo Daffara
 
Economics of public and private clouds
Economics of public and private cloudsEconomics of public and private clouds
Economics of public and private cloudsCarlo Daffara
 
Cloudexpoeurope open source cloud
Cloudexpoeurope open source cloudCloudexpoeurope open source cloud
Cloudexpoeurope open source cloudCarlo Daffara
 
Class conference 2014 daffara
Class conference 2014   daffaraClass conference 2014   daffara
Class conference 2014 daffaraCarlo Daffara
 
Collaborative economics
Collaborative economicsCollaborative economics
Collaborative economicsCarlo Daffara
 
Making clouds: turning opennebula into a product
Making clouds: turning opennebula into a productMaking clouds: turning opennebula into a product
Making clouds: turning opennebula into a productCarlo Daffara
 
Economic value of open source
Economic value of open sourceEconomic value of open source
Economic value of open sourceCarlo Daffara
 
Economic impact of open source software
Economic impact of open source softwareEconomic impact of open source software
Economic impact of open source softwareCarlo Daffara
 

Plus de Carlo Daffara (20)

mindtrek2016 - the economics of open source clouds
mindtrek2016 - the economics of open source cloudsmindtrek2016 - the economics of open source clouds
mindtrek2016 - the economics of open source clouds
 
Economics of public and private clouds
Economics of public and private cloudsEconomics of public and private clouds
Economics of public and private clouds
 
Cloudexpoeurope open source cloud
Cloudexpoeurope open source cloudCloudexpoeurope open source cloud
Cloudexpoeurope open source cloud
 
Class conference 2014 daffara
Class conference 2014   daffaraClass conference 2014   daffara
Class conference 2014 daffara
 
Collaborative economics
Collaborative economicsCollaborative economics
Collaborative economics
 
Daffara economics
Daffara economicsDaffara economics
Daffara economics
 
Making clouds: turning opennebula into a product
Making clouds: turning opennebula into a productMaking clouds: turning opennebula into a product
Making clouds: turning opennebula into a product
 
Da zero al cloud
Da zero al cloudDa zero al cloud
Da zero al cloud
 
Nonsoftwareoss
NonsoftwareossNonsoftwareoss
Nonsoftwareoss
 
Cloud
CloudCloud
Cloud
 
Businessonopen2012
Businessonopen2012Businessonopen2012
Businessonopen2012
 
Economic value of open source
Economic value of open sourceEconomic value of open source
Economic value of open source
 
Economic impact of open source software
Economic impact of open source softwareEconomic impact of open source software
Economic impact of open source software
 
Mythrealities
MythrealitiesMythrealities
Mythrealities
 
Transfersummit2011
Transfersummit2011Transfersummit2011
Transfersummit2011
 
Owf2010 daffara
Owf2010 daffaraOwf2010 daffara
Owf2010 daffara
 
Linuxtag daffara
Linuxtag daffaraLinuxtag daffara
Linuxtag daffara
 
Oss healthcare
Oss healthcareOss healthcare
Oss healthcare
 
Empoweringsme
EmpoweringsmeEmpoweringsme
Empoweringsme
 
Ipross
IprossIpross
Ipross
 

Dernier

Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 

Dernier (20)

Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 

Disaster recovery with open nebula

  • 1. Disaster recovery with OpenNebula Carlo Daffara
  • 2. First, let me get some coffee.
  • 3.
  • 4.
  • 5.
  • 6. “Disaster recovery (DR) involves a set of policies and procedures to enable the recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster. Disaster recovery focuses on the IT or technology systems supporting critical business functions, as opposed to business continuity, which involves keeping all essential aspects of a business functioning despite significant disruptive events. Disaster recovery is therefore a subset of business continuity.”
  • 7. 80% of businesses affected by a major incident either never re-open or close within 18 months (Source: Axa)
  • 8. From “Understanding the Cost of Data Center Downtime: An Analysis of the Financial Impact on Infrastructure Vulnerability”, Ponemon Research
  • 9. “Let’s begin with one very interesting fact. According to a survey completed in 2010, human error is responsible for 40% of all data loss, as compared to just 29% for hardware or system failures. An earlier IBM study determined data loss due to human error was as high as 80%” (From: Business continuity and disaster recovery planning for IT professionals”, Elsevier press, 2014)
  • 10.
  • 11.
  • 12.
  • 13. The recovery time objective (RTO) is the targeted duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity. The recovery point objective (RPO), is the maximum tolerable period in which data might be lost from an IT service due to a major incident.
  • 14. “Alternative storage-based replication solutions cost a minimum of $10,000 per terabyte of data covered plus ongoing maintenance. For the composite organization’s 225 protected VMs with an average size of 100 gigabytes (GB), the three year costs for licenses and maintenance are estimated at $328,500” (Forrester research, “The Total Economic Impact of VMware vCenter Site Recovery Manager”, 2013)
  • 15. 3 simple rules to make a working DR:
  • 16. Rule 1: never put all eggs in one basket (be it hardware, software, cloud)
  • 17.
  • 18. Customer buys full DR and snapshot capability from local data center; data center updates SAN firmware and loses everything. Customer discovers that snapshots and backups were kept in the same SAN with everything else.
  • 19.
  • 20. In electronics, an opto-isolator, also called an optocoupler, photocoupler, or optical isolator, is a component that transfers electrical signals between two isolated circuits by using light. Opto-isolators prevent high voltages from affecting the system receiving the signal.
  • 21.
  • 22. Rule 2: RTO and RPO are usually different from VM to VM
  • 23.
  • 24.
  • 25. Needs to be replicated constantly No one cares if this dies
  • 26.
  • 27.
  • 28. Rule 3: design a reliable oracle
  • 29.
  • 30.
  • 32. How the others do it:
  • 33.
  • 34.
  • 35. How we do it:
  • 36.
  • 37. Our approach takes advantage of three individual factors: ● LizardFS’ thinly-provisioned snapshots ● online replication of chunks & tiering ● OpenNebula’s datastores
  • 38.
  • 39.
  • 40. # An example of configuration of goals. It contains the default values. 1 1 : _ 2 2 : _ _ 3 3 : _ _ _ 4 4 : _ _ _ _ 5 5 : _ _ _ _ _ # (...) 20 20 : _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ # But you don't have to specify all of them -- defaults will be assumed. # You can define your own custom goals using labels if you use them, e.g.: # 14 min_two_locations: _ locationA locationB # one copy in A, one in B, third anywhere # 15 fast_access : ssd _ _ # one copy on ssd, two additional on any drives # 16 two_manufacturers: WD HT # one on WD disk, one on HT disk
  • 41. ● Most disasters are “local”, for example a fire in the server room or a flood ● Two different DR sites, one near (eg. next building/other side of the building) and one far (external datacenter) ● near DR receives a copy of the chunks that are part of the marked datastores
  • 42.
  • 43. ● Remote snapshots are handled in the same way: we take a full snapshot of the datastore, and differentially replicate it ● We use the “snapshot of snapshot” approach to avoid the cost of deduplication ● This way we can prioritize sync queues, and in the receiving end we got a complete and decoupled + working OpenNebula For example, average dedup cost for ZFS: 5 to 30 GB of dedup table data for every TB of pool data, assuming an average block size of 64K.
  • 44. /var/lib/one/datastore ↓ DRSNAP12H /var/lib/one/snapshots ↓ <yyyymmddhh> ↓ DRSNAP12H Local VM changes only in snapshots /var/lib/one/datastore ↓ DRSNAP12H /var/lib/one/snapshots ↓ <yyyymmddhh> ↓ DRSNAP12H Remote no chunk changes in snapshots inplace rsync (25x speedup)
  • 45.
  • 46. virsh# domblkstat instance-0012 --device vda vda rd_req 128 vda rd_bytes 2344448 vda wr_req 234 vda wr_bytes 618496 vda flush_operations 2 vda rd_total_times 106512819 vda wr_total_times 960359872 vda flush_total_times 1741727
  • 47. Our “pilot light” approach: a running OpenNebula on two nodes, with its own LizardFS store. Running only two VMs: the Oracle and the Tester The Oracle checks if DR is needed, and may need a human confirmation for execution of the DR failover. If confirmation is given, it takes the latest valid snapshotted datastore, softlinks it and import the VMs (through snapshots, so it’s instantaneous) The Tester makes a snapshot of the current stable snapshot, import the VMs and runs them into a separate, non-routed vnet, then executes a test to see if everything works (workload dependent), then deletes the intermediate snapshots
  • 48. Only critical VMs are executed this way, if RTO<30 mins For the VMs with higher RTO, buy one week of hardware on demand, auto-install a node with Puppet or Ansible, and make it join the OpenNebula cloud Deployed usually in 30 mins. Other vendor guarantee <15 minutes.
  • 49.
  • 50.
  • 51. Ideal for harsh indoor environments that require protection from falling dirt or liquid, dust, light splashing, oil or coolant seepage. Its NEMA Zone 4 rating also makes it perfect for facilities located in earthquake-prone seismic zones or any environment prone to extreme vibration such as factories, power stations, construction areas, shipping facilities, warehouses, processing plants, railroads, airports and military installations.
  • 52.
  • 53.
  • 54. ● Have a “big red button” to stop DR if needed. Sometimes you are already fighting fire here, and you know it’s better not to move everything in flight. ● Have two people that are competent as DR firefighters, and give them a second phone with a rechargeable card. And make sure both don’t go on vacation together. (Hint: don’t choose two married people)
  • 55. ● Use a gateway machine to provide a consistent internal IP scheme, and two different configurations for the gateway router to provide unmodified routing for the remaining VMs ● Aggregate functionality in a single VM (for example, one that manages logs) to optimize writes
  • 56. ● I favor consistency, so I tend to avoid application-level replication, unless it’s native to the app (eg. NoSQL). Otherwise you have different solutions for different machines (eg. quorum group in MS replication with same UUID…) ● Try to reduce write amplification for databases, especially MySQL. Eg. TokuDB and its fractal tree
  • 57.