5. Who am I?
• Twitter @adilarif001
• Blog EnterpriseDaddy.com
• I have a job! Senior Technical Support Engineer @ Rubrik
• Background Sysadmin working with some of the largest IT
organizations at IGATE (now Capgemini).
Most recently @VMware as a TSE with the
vSphere product line keeping customers
happy!
8. What is Disaster Recovery?
Disaster recovery is a set of policies and procedures which focus
on protecting an organization from any significant effects in
case of a negative event, which may include cyberattacks,
natural disasters or building or device failures
9. Types of Disaster
• Natural
• Hardware or Software failure
• Malicious
• Human Error
• Many more….
10. Common Definitions
• Recovery Point Objective (RPO)
• Recovery Time Objective (RTO)
• Work Recovery Time (WRT)
• Maximum Tolerable Downtime (MTD)
• Business Impact Analysis (BIA)
12. Phases of Disaster Recovery Planning
• Identify: Risk Assessment
• Analyze: Business Impact Analysis (BIA)
• Design: Strategy Selection
• Implement: Create/Execute
• Measure: Test and Maintenance
13. Failover Types
• Planned: The process is tested regularly in a manner that does
not impact production.
• Unplanned: The process is tested regularly in a manner that
does not impact production.
• Test: The process is tested regularly in a manner that does not
impact production.
14. Disaster Recovery Site Types
• Hot Site: Duplicate of the original with real-time replication of
protected systems
• Warm Site: Systems but without replication of data. Typically
backups must be restored
• Cold Site: A location but may not have required equipment
readily available
15. Why Choose Cloud as a DR Site?
• Traditional DR sites are expensive and are used only for testing
in most cases.
• Consumption based Model.
• Compute charges are minimal, pay only for storage
consumption.
• Scalability.
• Highly secure.
17. One Solution for multiple infrastructures
Hyper-V toHyper-V
(on-premises)
Hyper-V Hyper-V
Replication
Hyper-VtoMicrosoftAzure
Hyper-V
Microsoft
Azure
Replication
VMwareor Physicalto
VMware(on-premises)
VMwareor Physical VMware
Replication
VMwareor Physicalto
MicrosoftAzure
VMwareor Physical
Microsoft
Azure
Replication
Hyper-V toHyper-V
(on-premises)
Hyper-V Hyper-V
Replication
SAN SAN
18. One Solution for multiple infrastructures
• Automated VM level Replication and Failover.
• Planned and Unplanned Failover.
• Orchestrated Recovery Plans for Disaster Recovery.
• Test failover capabilities without impact to Production.
• Migrate to Azure from Anywhere.
19. Recovery Services Vault
• A Recovery Services vault is a storage entity in Azure that houses
data.
• Created a specific region as any other ARM resource.
• Some of the key features are:
• Enhanced capabilities to help secure backup data
• Central monitoring for your hybrid IT environment
• Role-Based Access Control (RBAC)
• Protect all configurations of Azure Virtual Machines
• Instant restore for IaaS VMs
22. ASR for Hyper-V workloads
• Prepare Azure.
• Prepare On-premises Hyper-V.
• Set up Disaster Recovery for Hyper-V VMs.
• Set up Disaster Recovery for Hyper-V VMs in VMM clouds (optional).
• Run a Disaster Recovery drill.
• Run failover and failback.
23. Prepare Azure
• Create an Azure Storage Account.
• Create an Azure Recovery Services Vault.
• Set an Azure network.
24. Prepare On-premises Hyper-V
• Review Hyper-V hosts and VMs requirements.
• Verify Internet Access.
• Prepare VMs so that they are accessible after failover.
25. Setup Disaster Recovery for Hyper-V
VMs
• Select a Protection Goal.
• Set up the source and target Replication environment.
• Create a Replication Policy.
• Enable Replication for a VM.
26. Run a Disaster Recovery drill
• Set up an isolated network for the test failover.
• Run a test failover for a single machine.
27. Run failover and failback
• Run a failover to Azure.
• Reprotect Azure VMs to the on-premises site.
• Failback from Azure to on-premises.
• Reprotect on-premises VMs to start replication to Azure again.
29. ASR for VMware workloads
• Prepare Azure.
• Prepare on-premises VMware.
• Setup Disaster Recovery.
• Run a Disaster Recovery drill.
• Run failover and failback.
Note: Steps 1, 4 and 5 are same as Hyper-V workloads
30. Prepare On-premises VMware
• Prepare an account on the vCenter server or vSphere ESXi host, to
automate VM discovery
• Prepare an account for automatic installation of the Mobility service
on VMware VMs
• Review VMware server requirements
• Review VMware VM requirements
31. Setup Disaster Recovery
• Enter the replication source and target.
• Set up the source replication environment, including on-premises
Azure Site Recovery components, and the target replication
environment.
• Create a replication policy.
• Enable replication for a VM.
All businesses run on applications, it is critical that the IT team is equipped with necessary tools to keep the apps running all the time.
This means that resiliency for services is critical.
As the definition says, we need to protect from hardware, software, site failures, etc.
Recovery Point Objective (RPO) determines the maximum acceptable amount of data loss measured in time. For example, If the maximum amount of data loss you can tolerate is 15 minutes, your RPO is 15 minutes.
Recovery Time Objective (RTO) determines the maximum tolerable amount of time needed to bring all critical systems back online. This covers, for example, restore data from back-up or fix of a failure. In most cases this part is carried out by a system administrator, network administrator, storage administrator etc.
Work Recovery Time (WRT) determines the maximum tolerable amount of time that is needed to verify the system and/or data integrity. This could be, for example, checking the databases and logs, making sure the applications or services are running and are available. In most cases those tasks are performed by application administrator, database administrator etc. When all systems affected by the disaster are verified and/or recovered, the environment is ready to resume the production again.
Maximum Tolerable Downtime (MTD) which defines the total amount of time that a business process can be disrupted without causing any unacceptable consequences. This value should be defined by the business management team.
Business Impact Analysis (BIA) is a process to determine the impact a disaster would have on an organization, typically your RPO, RTO, WRT, and MTD metrics are developed from a BIA report.
Planned: You perform the failover ahead of the disaster striking which enables any outage to be at a time of your choosing and there should be no unexpected data loss.
Unplanned: Failover is forced causing an outage that you don’t choose the time of and there could be data loss.
Test: The process is tested regularly in a manner that does not impact production.