SlideShare une entreprise Scribd logo
1  sur  15
Disaster Recovery Plan (DRP)
September 8, 2013
After the Outbreak: Rebuilding the World
Original Date: 08/18/13
Revision Date: 08/22/13
Written By: Joe Graziano
Sr. Infrastructure Engineer
W.R.O – World Rebuild Organization
Table of Contents
1. Purpose and Objective............................................................................................................................................3
Scope............................................................................................................................................................................5
2. Dependencies..........................................................................................................................................................6
3. Disaster Recovery Strategies...................................................................................................................................7
4. Disaster Recovery Procedures ................................................................................................................................7
Response Phase............................................................................................................................................................8
Resumption Phase........................................................................................................................................................9
Data Center Recovery...............................................................................................................................................9
Restoration Phase ......................................................................................................................................................11
Data Center Recovery.............................................................................................................................................11
Appendix A: Disaster Recovery Contacts - Admin Contact List .....................................................................................12
Appendix B: Document Maintenance Responsibilities and Revision History ...............................................................13
Appendix C: Server Farm Details ...................................................................................................................................14
Appendix D: Glossary/Terms.........................................................................................................................................15
1. Purpose and Objective
Life as we know it has changed, daily tasks now involve cleaning up the streets of the undead that were put down in the
wake of the most devastating virus mankind has ever seen. With the help of Mr Phil N. Thropist we have built up 3
buildings with the task of re-establishing an infrastructure to support this new world.
The World Recovery Organization hasdeveloped this disaster recovery plan (DRP) to be used in the event of a significant
disruption to the features listed in the table above. The goal of this plan is to outline the key recovery steps to be
performed during and after a disruption to return to normal operations as soon as possible. With the daily threat of the
undead and other factions of humans vying for supplies and dominance this plan is vital to our survival and adaptability.
While the plans for building the 3 datacenters were underway we considered what would happen if any of our
datacenters were compromised. After all, the zombie population is still out there. While we have stopped the spread of
the deadly virus those already affected pose a real threat to the survivors. A disaster recovery/business continuity plan
was needed. Using the specifications for the datacenters being built we calculated what additional hardware and
software would be needed to support any one datacenter should it be compromised. This new plan would mean adding
the following to each datacenter. The premise is should any datacenter fail the surviving datacenters would shoulder
the load that Site Recovery Manager would send to them while the restoration phase is activated and completed.
In order to ensure that any failed datacenter could be recovered we made workload calculations based on the number
of virtual servers and virtual desktops in each datacenter. With this information we added the following hardware and
software to each datacenter:
Hardware:
Datacenter 1
10x Dell Poweredge R905 servers
2x 48 port 1gb modules for Nexus 7000
Additional bricks for the Pillar SAN to accommodate backups
Datacenter 2
40x Dell Poweredge R905 servers
1x Nexus 7000
5x 48 port 1gb modules for Nexus 7000
Additional bricks for the Pillar SAN to accommodate backups
Datacenter 3
10xDell Poweredge R905 servers
2x 48 port 1gb modules for Nexus 7000
Additional bricks for the Pillar SAN to accommodate backups
Software
Syncsort DPX Backup Software
The Syncsort backup software modernizes data protection through highly efficient snapshot and replication
technology, integration with VMware and applications, resulting in dramatically faster backup and recovery operations
Oracle’s Pillar Axiom MaxRep Replication for SAN
This offering from Oracle allows the Pillar SAN to replicate to its partners in the secondary and tertiary
datacenters. With replication we can preserve all of our valuable data in multiple locations so that any failures,
attacks, or catastrophes can be recovered from in a timely manner
This new hardware and software introduces additional costs for the support team and for the organizations
stakeholders.
46% increase in server and network port density (more time required for rack/install/configure)
Addition of Syncsort DPX Backup software
o Syncsort Integrated Backup License ($7600 per 1TB) $2,280,000 for 300TB
o Exchange Mailbox Recovery (500 per license)x6 $3,000 for 3000 users
o Sharepoint Object Recovery $1,900
o Data protection Reporter $4,275
o Maintenance License $17,500
Addition of Axiom MaxRep SAN replication (increase time to create/configure multi-site SAN replication)
o
Scope
The scope of this DRP document addresses technical recovery in the event of a significant disruption. The intent of the
DRP is to be used in conjunction with the business continuity plan (BCP) W.R.O has. A DRP is a subset of the overall
recovery process contained in the BCP. Plans for the recovery of people, infrastructure, and internal and external
dependencies not directly relevant to the technical recovery outlined herein are included in the Business Continuity Plan
and/or the Corporate Incident Response and Incident Management plansW.R.O has in place.
The link to the specific BCP document related to this DRP can be found here: LINK TO BCP
This disaster recovery plan provides:
- Guidelines for determining plan activation;
- Technical response flow and recovery strategy;
- Guidelines forrecovery procedures;
- References to key Business Resumption Plans and technical dependencies;
- Rollback procedures that will be implemented to return to standard operating state;
- Checklists outlining considerations for escalation, incident management, and plan activation.
The specific objectives of thisdisaster recovery plan are to:
- Immediately mobilize a core group of leaders to assess the technical ramifications of a situation;
- Set technical priorities for the recovery team during the recovery period;
- Minimize the impact of the disruption to the impacted features and business groups;
- Stage the restoration of operations to full processing capabilities;
- Enable rollback operations once the disruption has been resolved if determined appropriate by the recovery
team.
- Arm all teams with ample weapons, armor and supplies to handle any Zombie issues that arise.
Within the recovery procedures there are significant dependencies between and supporting technical groups within and
outside W.R.O. This plan is designed to identify the steps that are expected to take to coordinate with other groups /
vendors to enable their own recovery.
2. Dependencies
This section outlines the dependencies made during the development of this disaster recovery plan. If and when needed
theDR TEAM will coordinate with their partner groups as needed to enable recovery.
Dependency Assumptions
User Interface Users (end users, power users, administrators) are unable to access the system
through any part of the instance (e.g. client or server side, web interface or
downloaded application).
Infrastructure and back-end services are still assumed to be active/running.
Business Intelligence /
Reporting
The collection, logging, filtering, and delivery of reported information to end users
is not functioning (with or without the user interface layer also being impacted).
Standardbackup processes are not impacted, but the active / passive or mirrored
processes are not functioning.
Specific types of disruptions could include components that process, match and
transforms information from the other layers. This includes business transaction
processing, report processing and data parsing.
Network Layers Connectivity to network resources is compromised and/or significant latency
issues in the network exist that result in lowered performance in other layers.
Assumption is that terminal connections, serially attached devices and inputs are
still functional.
Storage Layer Loss of SAN, local area storage, or other storage component.
Database Layer Data within the data stores is compromised and is either inaccessible, corrupt, or
unavailable
Hardware/Host Layer Physical components are unavailable or affected by a given event
Virtualizations (VM's) Virtual components are unavailable
Hardware and hosting services are accessible
Administration Support functions are disabled such as management services, backup services,
and log transfer functions.
Other services are presumed functional
In addition assumptions within the Business Continuity Plan for this work stream still apply.
3. Disaster Recovery Strategies
The overall DR strategy of W.R.O.is summarized in the table below and documented in more detail in the supporting
sections. These scenarios and strategies are consistent across the technical layers (user interface, reporting, etc.)
4. Disaster Recovery Procedures
A disaster recovery event can be broken out into three phases, the response, the resumption, and the restoration.
These phases are also managed in parallel with any corresponding business continuity recovery procedures summarized
in the business continuity plan.
Data Center Disruption
Failover to alternate Data Center
Reroute core processes to another Data
Center (without full failover)
Operate at a deprecated service level
Take no action
Significant network or
other issues
Reroute operations to backup processing
unit / service (load balancing, caching)
Wait for service to be restored,
communicate with core stakholders as
needed
•On call personnel paged
•Decision made around recovery strategies to be taken
•Full recovery team identified
Response Phase: The immediate actions following a significant event.
•Recovery procedures implemented
•Coordination with other departments executed as needed
Resumption Phase: Activities necessary to resume services after team has
been notified.
•Rollback procedures implemented
•Operations restored
Restoration Phase: Tasks taken to restore service to previous levels.
Response Phase
The following are the activities, parties and items necessary for a DR response in this phase. Please note these
procedures are the same regardless of the triggering event (e.g. whether caused by a Data Center disruption or other
scenario).
Response Phase Recovery Procedures – All DR Event Scenarios
Step Owner Duration Components
Identify issue, page on call /
Designated Responsible
Individual (DR TEAM)
DR TEAM 15 minutes Issue communicated / escalated
Priority set
Identify the team members
needed for recovery
DR TEAM 15 minutes Selection of core team members required for restoration
phase from among the following groups:
Operations
Establish a conference line for
a bridge call to coordinate next
steps
DR TEAM 10 minutes Primary bridge line: 555-123-9876
Secondary bridge line: 555-123-7485
Alternate / backup communication tools: email,
communicator
Communicate the specific
recovery roles and determine
which recovery strategy will be
pursued.
DR TEAM 60 minutes Documentation / tracking of timelines and next
decisions
Creation of disaster recovery event command center /
“war room” as needed
This information is also summarized by feature in Appendix A: Disaster Recovery Contacts - Admin Contact List.
Resumption Phase
During the resumption phase, steps will be taken to enable recovery to one or more surviving sites. VMware’s Site
Recovery Manager will be at the heart of the tasks required to resume service.
Data Center Recovery
Full Data Center Failover
Step Owner Duration Components
Initiate Failover DR TEAM /
SRM
TBD Restoration procedures identified
Risks assessed for each procedure
Coordination points between groups defined
Complete Failover DR TEAM /
SRM
TBD Recovery steps executed, including handoffs between
key dependencies
Test Recovery DR TEAM /
SRM
TBD Tests assigned and performed
Results summarized and communicated to group
Failover deemed successful DR TEAM TBD Signoff by DR TEAM
In the event of a Full Datacenter Failover the following processes should be followed to implement failover to the
surviving datacenters and to start the process of recovering the failed datacenter.
Connect remotely to the failed/compromised datacenter
Examine backup logs from previous night’s jobs for completion
Confirm replication of SAN LUN’s to secondary and tertiary datacenters
Restore required Tier 1 servers
Restore additional Virtual Desktop servers to manage user workloads
Gracefully shutdown servers at compromised datacenter
Below is a timeline for recovery actions associated with the failover the technical components between different data
centers to provide geo-redundant operations. Coordination of recovery actions is crucial. A timeline is necessary in
order to manage recovery between different groups and layers.
Reroute critical processes to alternate Data Center
Step Owner Duration Components
Identify and agree on failover Mr. Thropist 1 hour Meet with datacenter administration team
Assess outage and options for mitigation
Declare disaster
Remote access to Datacenter Joe Graziano
Backup Admin
SAN Admin
30min Ensure safe location, either other datacenter or mobile
access location
Make VPN connection to datacenter
Examine Backup Logs and SAN
Replication
Backup Admin
SAN Admin
1 hour Review Syncsort logs to ensure successful backups
Review SAN replication logs to ensure current
replication
Instant Virtualize (IV) Tier 1
Servers
Joe Graziano
Server Admin
8 hours Instant virtualize (IV) Tier 1 servers
Full Recover Tier 1 Servers
(RRP)
Joe Graziano
Server Admin
16 hours Kick off Rapid Return to Production on Tier 1 servers
Instant Virtualize (IV) Virtual
Desktop Servers
Joe Graziano
Virtual Desktop
Admin
4 hours Instant virtualize (IV) Virtual Desktop servers
Full Recover Virtual Desktop
Servers (RRP)
Joe Graziano
Virtual Desktop
Admin
8 hours Kick off Rapid Return to Production on Virtual Desktop
servers
Shutdown servers at failed
Datacenter
Joe Graziano 2 hours Power down all servers at failed datacenter
Power down SAN and ESX hosts
Restoration Phase
During the restoration phase, the steps taken to enable recovery will vary based on the type of issue. The procedures for
each recovery scenario are summarized below.
Data Center Recovery
Full Data Center Restoration
Step Owner Duration Components
Determine whether failback to
original Data Center will be
pursued
DR TEAM 1 Week Define Restoration procedures
Recon the failed datacenter
Map out entry points
Reinforce arms, weapons and supplies for raid
Clear failed datacenter of
Zombies
Daryl Dixon
Rick Grimes
Glen Rhee
Swat Team
5 hours Establish a perimeter
Lock down escape routes
Enter datacenter
Take no prisoners / Kill Zombies
Original data center restored DR TEAM 1 week Examine Network
Examine Servers / SAN
Repair/replace any damaged equipment
Burn bodies
Clean all building floors for repopulation
Complete Failback DR TEAM 16 hours Shutdown Tier 1 server at secondary and tertiary
datacenters
Power up Tier 1 servers in recovered datacenter
Test Failback DR TEAM TBD Tests assigned and performed
Results summarized and communicated to group
Issues (if any) communicated to group
Determine whether failback
was successful
DR TEAM TBD Declaration of successful failback and communication
to stakeholder group.
Disaster recovery procedures closed.
Results summarized, post mortem performed, and DRP
updated (as needed).
Appendix A: Disaster Recovery Contacts - Admin Contact List
The critical team members who would be involved in recovery procedures for feature sets are summarized below.
Feature Name Contact Lists
Oracle’s Pillar Axiom MaxRep Replication for SAN SAN TEAM
Syncsort DPX backup software for VMware Backup Admin Team
Failed Datacenter Recovery Team SWAT Team
Tier 1 Server failover and testing Server Admin Team
Virtual Desktop Server failover and testing Virtual Desktop Admin Team
SAN Team
Joe Graziano
Chris Wahl
Charles Bartowski
Backup Admin Team
Jonathan Frapier
Scott Lowe
Luther Stickell
SWAT Team
Daryl Dixon
Rick Grimes
Glen Rhee
Eric Wright
Server Admin Team
Akmal Waheed
Mike Laverick
Dade Murphy
Virtual Desktop Admin Team
Josh Atwell
Angelo Luciani
Emanuel Goldstein
Appendix B: Document Maintenance Responsibilities and Revision History
This section identifies the individuals and their roles and responsibilities for maintaining this Disaster Recovery Plan.
Primary Disaster Recovery Plan document owner is:Joe Graziano
Primary Designee: Joe Graziano
Alternate Designee #1: Jonathan Frapier
Alternate Designee #2: Akmal Waheed
Name of Person
Updating Document
Date Update Description Version # Approved By
Joe Graziano 8/20/2013 Added VMware SRM
documentation
1.1 Phil N. Thropist
Joe Graziano 8/21/2103 Added Alternate Designee’s
Jonathan and Akmal
1.2 Phil N. Thropist
Joe Graziano 8/22/2103 Completed final draft for release
to Virtual Design Master
1.3 Phil N. Thropist
Appendix C: Server Farm Details
Datacenter 1 Diagrams
Datacenter 2 Diagrams
Datacenter 3 Diagrams
Appendix D: Glossary/Terms
Standard Operating State: Production state where services are functioning at standard state levels. In contrast to
recovery state operating levels, this can support business functions at minimum but deprecated levels.
Presentation Layer: Layer which users interact with. This typically encompasses systems that support the UI, manage
rendering, and captures user interactions. User responses are parsed and system requests are passed for processing
and data retrieval to the appropriate layer.
Processing Layer: System layer which processes and synthesizes user input, data output, and transactional operations
within an application stack. Typically this layer processes data from the other layers. Typically these services are folded
into the presentation and database layer, however for intensive applications; this is usually broken out into its own
layer.
Database Layer:The database layer is where data typically resides in an application stack. Typically data is stored in a
relational database such as SQL Server, Microsoft Access, or Oracle, but it can be stored as XML, raw data, or tables.
This layer typically is optimized for data querying, processing and retrieval.
Network Layer: The network layer is responsible for directing and managing traffic between physical hosts. It is
typically an infrastructure layer and is usually outside the purview of most business units. This layer usually supports
load balancing, geo-redundancy, and clustering.
Storage Layer:This is typically an infrastructure layer and provides data storage and access. In most environments this is
usually regarded as SAN or NAS storage.
Hardware/Host Layer: This layer refers to the physical machines that all other layers are reliant upon. Depending on
the organization, management of the physical layer can be performed by the stack owner or the purview of an
infrastructure support group.
Virtualization Layer: In some environments virtual machines (VM's) are used to partition/encapsulate a machine's
resources to behave as separate distinct hosts. The virtualization layer refers to these virtual machines.
Administrative Layer: The administrative layer encompasses the supporting technology components which provide
access, administration, backups, and monitoring of the other layers.

Contenu connexe

Tendances

Phase II 12.14.14
Phase II 12.14.14Phase II 12.14.14
Phase II 12.14.14
haney888
 
Presentation on backup and recoveryyyyyyyyyyyyy
Presentation on backup and recoveryyyyyyyyyyyyyPresentation on backup and recoveryyyyyyyyyyyyy
Presentation on backup and recoveryyyyyyyyyyyyy
Tehmina Gulfam
 
TECHNICAL WHITE PAPER▶NetBackup 5330 Resiliency/High Availability Attributes
TECHNICAL WHITE PAPER▶NetBackup 5330 Resiliency/High Availability AttributesTECHNICAL WHITE PAPER▶NetBackup 5330 Resiliency/High Availability Attributes
TECHNICAL WHITE PAPER▶NetBackup 5330 Resiliency/High Availability Attributes
Symantec
 
Data guard architecture
Data guard architectureData guard architecture
Data guard architecture
Vimlendu Kumar
 

Tendances (20)

Phase II 12.14.14
Phase II 12.14.14Phase II 12.14.14
Phase II 12.14.14
 
AITP July 2012 Presentation - Disaster Recovery - Business + Technology
AITP July 2012 Presentation - Disaster Recovery - Business + TechnologyAITP July 2012 Presentation - Disaster Recovery - Business + Technology
AITP July 2012 Presentation - Disaster Recovery - Business + Technology
 
Presentation on backup and recoveryyyyyyyyyyyyy
Presentation on backup and recoveryyyyyyyyyyyyyPresentation on backup and recoveryyyyyyyyyyyyy
Presentation on backup and recoveryyyyyyyyyyyyy
 
TECHNICAL BRIEF▶NetBackup Appliance AutoSupport for NetBackup 5330
TECHNICAL BRIEF▶NetBackup Appliance AutoSupport for NetBackup 5330TECHNICAL BRIEF▶NetBackup Appliance AutoSupport for NetBackup 5330
TECHNICAL BRIEF▶NetBackup Appliance AutoSupport for NetBackup 5330
 
HPE Data Protector Best Practice Guide
HPE Data Protector Best Practice GuideHPE Data Protector Best Practice Guide
HPE Data Protector Best Practice Guide
 
TECHNICAL WHITE PAPER▶NetBackup 5330 Resiliency/High Availability Attributes
TECHNICAL WHITE PAPER▶NetBackup 5330 Resiliency/High Availability AttributesTECHNICAL WHITE PAPER▶NetBackup 5330 Resiliency/High Availability Attributes
TECHNICAL WHITE PAPER▶NetBackup 5330 Resiliency/High Availability Attributes
 
Select enterprise backup software
Select enterprise backup softwareSelect enterprise backup software
Select enterprise backup software
 
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
 
Meeting Your Virtual RTO / RPO Objectives
Meeting Your Virtual RTO / RPO ObjectivesMeeting Your Virtual RTO / RPO Objectives
Meeting Your Virtual RTO / RPO Objectives
 
Small Business Backup & Disaster Recovery
Small Business Backup & Disaster RecoverySmall Business Backup & Disaster Recovery
Small Business Backup & Disaster Recovery
 
Data guard oracle
Data guard oracleData guard oracle
Data guard oracle
 
Dp%20 fudamentals%20%28ch1%29
Dp%20 fudamentals%20%28ch1%29Dp%20 fudamentals%20%28ch1%29
Dp%20 fudamentals%20%28ch1%29
 
TECHNICAL WHITE PAPER: NetBackup Appliances WAN Optimization
TECHNICAL WHITE PAPER: NetBackup Appliances WAN OptimizationTECHNICAL WHITE PAPER: NetBackup Appliances WAN Optimization
TECHNICAL WHITE PAPER: NetBackup Appliances WAN Optimization
 
Automate DG Best Practices
Automate DG  Best PracticesAutomate DG  Best Practices
Automate DG Best Practices
 
hpe manual data protector 9.07 granular extension guides
hpe manual data protector 9.07 granular extension guideshpe manual data protector 9.07 granular extension guides
hpe manual data protector 9.07 granular extension guides
 
Data Center Optimization
Data Center OptimizationData Center Optimization
Data Center Optimization
 
Database recovery
Database recoveryDatabase recovery
Database recovery
 
TECHNICAL BRIEF▶ NetBackup 7.6 Deduplication Technology
TECHNICAL BRIEF▶ NetBackup 7.6 Deduplication TechnologyTECHNICAL BRIEF▶ NetBackup 7.6 Deduplication Technology
TECHNICAL BRIEF▶ NetBackup 7.6 Deduplication Technology
 
Data guard architecture
Data guard architectureData guard architecture
Data guard architecture
 
Microsoft SQL High Availability and Scaling
Microsoft SQL High Availability and ScalingMicrosoft SQL High Availability and Scaling
Microsoft SQL High Availability and Scaling
 

En vedette (8)

Akmal Khaleeq Waheed - Challenge 3 p2
Akmal Khaleeq Waheed - Challenge 3 p2Akmal Khaleeq Waheed - Challenge 3 p2
Akmal Khaleeq Waheed - Challenge 3 p2
 
Joe Graziano – Challenge 2 Design Solution (Part 2)
Joe Graziano – Challenge 2 Design Solution (Part 2)Joe Graziano – Challenge 2 Design Solution (Part 2)
Joe Graziano – Challenge 2 Design Solution (Part 2)
 
Akmal Khaleeq Waheed - Challenge 3 p3
Akmal Khaleeq Waheed - Challenge 3 p3Akmal Khaleeq Waheed - Challenge 3 p3
Akmal Khaleeq Waheed - Challenge 3 p3
 
Joe Graziano – Challenge 2 Design Solution Maxrep data-sheet-1727271
Joe Graziano – Challenge 2 Design Solution  Maxrep data-sheet-1727271Joe Graziano – Challenge 2 Design Solution  Maxrep data-sheet-1727271
Joe Graziano – Challenge 2 Design Solution Maxrep data-sheet-1727271
 
Virtual Design Master Challenge 1 - Joe
Virtual Design Master Challenge 1 - JoeVirtual Design Master Challenge 1 - Joe
Virtual Design Master Challenge 1 - Joe
 
Rebuilding theworld
Rebuilding theworldRebuilding theworld
Rebuilding theworld
 
Toronto VMUG - November 13, 2013 - CiRBA
Toronto VMUG - November 13, 2013 - CiRBAToronto VMUG - November 13, 2013 - CiRBA
Toronto VMUG - November 13, 2013 - CiRBA
 
Joe Graziano – Challenge 2 Design Solution - Syncsort dpx 411
Joe Graziano – Challenge 2 Design Solution  - Syncsort dpx 411Joe Graziano – Challenge 2 Design Solution  - Syncsort dpx 411
Joe Graziano – Challenge 2 Design Solution - Syncsort dpx 411
 

Similaire à Joe Graziano – Challenge 2 Design Solution (Part 1)

Disaster Recovery Plan
Disaster Recovery PlanDisaster Recovery Plan
Disaster Recovery Plan
David Donovan
 
COMPANY Disaster Recovery Plan (DRP) for [PRODU.docx
COMPANY    Disaster Recovery Plan (DRP) for [PRODU.docxCOMPANY    Disaster Recovery Plan (DRP) for [PRODU.docx
COMPANY Disaster Recovery Plan (DRP) for [PRODU.docx
monicafrancis71118
 
ProfitBricks-white-paper-Disaster-Recovery-US
ProfitBricks-white-paper-Disaster-Recovery-USProfitBricks-white-paper-Disaster-Recovery-US
ProfitBricks-white-paper-Disaster-Recovery-US
Mudia Akpobome
 
Creating And Implementing A Data Disaster Recovery Plan
Creating And Implementing A Data Disaster Recovery PlanCreating And Implementing A Data Disaster Recovery Plan
Creating And Implementing A Data Disaster Recovery Plan
Rishu Mehra
 
Creating And Implementing A Data Disaster Recovery Plan
Creating And Implementing A Data  Disaster  Recovery  PlanCreating And Implementing A Data  Disaster  Recovery  Plan
Creating And Implementing A Data Disaster Recovery Plan
Rishu Mehra
 

Similaire à Joe Graziano – Challenge 2 Design Solution (Part 1) (20)

Mastering Backup and Disaster Recovery: Ensuring Data Continuity and Resilience
Mastering Backup and Disaster Recovery: Ensuring Data Continuity and ResilienceMastering Backup and Disaster Recovery: Ensuring Data Continuity and Resilience
Mastering Backup and Disaster Recovery: Ensuring Data Continuity and Resilience
 
Disaster Recovery Plan
Disaster Recovery PlanDisaster Recovery Plan
Disaster Recovery Plan
 
COMPANY Disaster Recovery Plan (DRP) for [PRODU.docx
COMPANY    Disaster Recovery Plan (DRP) for [PRODU.docxCOMPANY    Disaster Recovery Plan (DRP) for [PRODU.docx
COMPANY Disaster Recovery Plan (DRP) for [PRODU.docx
 
Maximizing Business Continuity Success
Maximizing Business Continuity SuccessMaximizing Business Continuity Success
Maximizing Business Continuity Success
 
Xd planning guide - storage best practices
Xd   planning guide - storage best practicesXd   planning guide - storage best practices
Xd planning guide - storage best practices
 
Disaster Recovery: Understanding Trend, Methodology, Solution, and Standard
Disaster Recovery:  Understanding Trend, Methodology, Solution, and StandardDisaster Recovery:  Understanding Trend, Methodology, Solution, and Standard
Disaster Recovery: Understanding Trend, Methodology, Solution, and Standard
 
Shielding Data Assets: Exploring Data Protection and Disaster Recovery Strate...
Shielding Data Assets: Exploring Data Protection and Disaster Recovery Strate...Shielding Data Assets: Exploring Data Protection and Disaster Recovery Strate...
Shielding Data Assets: Exploring Data Protection and Disaster Recovery Strate...
 
Deepak_ppt_ver1.0.pptx
Deepak_ppt_ver1.0.pptxDeepak_ppt_ver1.0.pptx
Deepak_ppt_ver1.0.pptx
 
IBM PROTECTIER: FROM BACKUP TO RECOVERY
IBM PROTECTIER: FROM BACKUP TO RECOVERYIBM PROTECTIER: FROM BACKUP TO RECOVERY
IBM PROTECTIER: FROM BACKUP TO RECOVERY
 
Module 4 disaster recovery student slides ver 1.0
Module 4 disaster recovery   student slides ver 1.0Module 4 disaster recovery   student slides ver 1.0
Module 4 disaster recovery student slides ver 1.0
 
Boomerang Total Recall
Boomerang Total RecallBoomerang Total Recall
Boomerang Total Recall
 
What every IT audit should know about backup and recovery
What every IT audit should know about backup and recoveryWhat every IT audit should know about backup and recovery
What every IT audit should know about backup and recovery
 
IRJET-Comparative Analysis of Disaster Recovery Solutions in Cloud Computing
IRJET-Comparative Analysis of Disaster Recovery Solutions in Cloud ComputingIRJET-Comparative Analysis of Disaster Recovery Solutions in Cloud Computing
IRJET-Comparative Analysis of Disaster Recovery Solutions in Cloud Computing
 
ProfitBricks-white-paper-Disaster-Recovery-US
ProfitBricks-white-paper-Disaster-Recovery-USProfitBricks-white-paper-Disaster-Recovery-US
ProfitBricks-white-paper-Disaster-Recovery-US
 
HADRFINAL13112016
HADRFINAL13112016HADRFINAL13112016
HADRFINAL13112016
 
Audax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data ExplosionAudax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data Explosion
 
Bluelock's Recovery Suite
Bluelock's Recovery SuiteBluelock's Recovery Suite
Bluelock's Recovery Suite
 
Business Continuity Presentation[1]
Business Continuity Presentation[1]Business Continuity Presentation[1]
Business Continuity Presentation[1]
 
Creating And Implementing A Data Disaster Recovery Plan
Creating And Implementing A Data Disaster Recovery PlanCreating And Implementing A Data Disaster Recovery Plan
Creating And Implementing A Data Disaster Recovery Plan
 
Creating And Implementing A Data Disaster Recovery Plan
Creating And Implementing A Data  Disaster  Recovery  PlanCreating And Implementing A Data  Disaster  Recovery  Plan
Creating And Implementing A Data Disaster Recovery Plan
 

Plus de tovmug

Joe Graziano – Challenge 2 Design Solution V dm2 datacenter3
Joe Graziano – Challenge 2 Design Solution  V dm2 datacenter3Joe Graziano – Challenge 2 Design Solution  V dm2 datacenter3
Joe Graziano – Challenge 2 Design Solution V dm2 datacenter3
tovmug
 
Joe Graziano – Challenge 2 Design Solution - V dm2 datacenter2
Joe Graziano – Challenge 2 Design Solution  - V dm2 datacenter2Joe Graziano – Challenge 2 Design Solution  - V dm2 datacenter2
Joe Graziano – Challenge 2 Design Solution - V dm2 datacenter2
tovmug
 
Joe Graziano – Challenge 2 Design Solution V dm2 datacenter1
Joe Graziano – Challenge 2 Design Solution V dm2 datacenter1Joe Graziano – Challenge 2 Design Solution V dm2 datacenter1
Joe Graziano – Challenge 2 Design Solution V dm2 datacenter1
tovmug
 
Trend Micro Dec 6 Toronto VMUG
Trend Micro Dec 6 Toronto VMUGTrend Micro Dec 6 Toronto VMUG
Trend Micro Dec 6 Toronto VMUG
tovmug
 
Cisco Dec 6 Toronto VMUG
Cisco Dec 6 Toronto VMUGCisco Dec 6 Toronto VMUG
Cisco Dec 6 Toronto VMUG
tovmug
 

Plus de tovmug (11)

Akmal Khaleeq Waheed - Challenge 3 p1
Akmal Khaleeq Waheed - Challenge 3 p1Akmal Khaleeq Waheed - Challenge 3 p1
Akmal Khaleeq Waheed - Challenge 3 p1
 
Akmal Khaleeq Waheed - Challenge 3
Akmal Khaleeq Waheed - Challenge 3Akmal Khaleeq Waheed - Challenge 3
Akmal Khaleeq Waheed - Challenge 3
 
Jonathan Frappier - Challenge 3
Jonathan Frappier - Challenge 3Jonathan Frappier - Challenge 3
Jonathan Frappier - Challenge 3
 
Joe Graziano – Challenge 2 Design Solution V dm2 datacenter3
Joe Graziano – Challenge 2 Design Solution  V dm2 datacenter3Joe Graziano – Challenge 2 Design Solution  V dm2 datacenter3
Joe Graziano – Challenge 2 Design Solution V dm2 datacenter3
 
Joe Graziano – Challenge 2 Design Solution - V dm2 datacenter2
Joe Graziano – Challenge 2 Design Solution  - V dm2 datacenter2Joe Graziano – Challenge 2 Design Solution  - V dm2 datacenter2
Joe Graziano – Challenge 2 Design Solution - V dm2 datacenter2
 
Joe Graziano – Challenge 2 Design Solution V dm2 datacenter1
Joe Graziano – Challenge 2 Design Solution V dm2 datacenter1Joe Graziano – Challenge 2 Design Solution V dm2 datacenter1
Joe Graziano – Challenge 2 Design Solution V dm2 datacenter1
 
Akmal Waheed – Challenge 2 Design Solution
Akmal Waheed – Challenge 2 Design Solution Akmal Waheed – Challenge 2 Design Solution
Akmal Waheed – Challenge 2 Design Solution
 
Virtual Design Master Challenge 1 - Jonathan
Virtual Design Master Challenge 1  - JonathanVirtual Design Master Challenge 1  - Jonathan
Virtual Design Master Challenge 1 - Jonathan
 
Virtual Design Master Challenge 1 - Akmal
Virtual Design Master Challenge 1  - AkmalVirtual Design Master Challenge 1  - Akmal
Virtual Design Master Challenge 1 - Akmal
 
Trend Micro Dec 6 Toronto VMUG
Trend Micro Dec 6 Toronto VMUGTrend Micro Dec 6 Toronto VMUG
Trend Micro Dec 6 Toronto VMUG
 
Cisco Dec 6 Toronto VMUG
Cisco Dec 6 Toronto VMUGCisco Dec 6 Toronto VMUG
Cisco Dec 6 Toronto VMUG
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Joe Graziano – Challenge 2 Design Solution (Part 1)

  • 1. Disaster Recovery Plan (DRP) September 8, 2013 After the Outbreak: Rebuilding the World Original Date: 08/18/13 Revision Date: 08/22/13 Written By: Joe Graziano Sr. Infrastructure Engineer W.R.O – World Rebuild Organization
  • 2. Table of Contents 1. Purpose and Objective............................................................................................................................................3 Scope............................................................................................................................................................................5 2. Dependencies..........................................................................................................................................................6 3. Disaster Recovery Strategies...................................................................................................................................7 4. Disaster Recovery Procedures ................................................................................................................................7 Response Phase............................................................................................................................................................8 Resumption Phase........................................................................................................................................................9 Data Center Recovery...............................................................................................................................................9 Restoration Phase ......................................................................................................................................................11 Data Center Recovery.............................................................................................................................................11 Appendix A: Disaster Recovery Contacts - Admin Contact List .....................................................................................12 Appendix B: Document Maintenance Responsibilities and Revision History ...............................................................13 Appendix C: Server Farm Details ...................................................................................................................................14 Appendix D: Glossary/Terms.........................................................................................................................................15
  • 3. 1. Purpose and Objective Life as we know it has changed, daily tasks now involve cleaning up the streets of the undead that were put down in the wake of the most devastating virus mankind has ever seen. With the help of Mr Phil N. Thropist we have built up 3 buildings with the task of re-establishing an infrastructure to support this new world. The World Recovery Organization hasdeveloped this disaster recovery plan (DRP) to be used in the event of a significant disruption to the features listed in the table above. The goal of this plan is to outline the key recovery steps to be performed during and after a disruption to return to normal operations as soon as possible. With the daily threat of the undead and other factions of humans vying for supplies and dominance this plan is vital to our survival and adaptability. While the plans for building the 3 datacenters were underway we considered what would happen if any of our datacenters were compromised. After all, the zombie population is still out there. While we have stopped the spread of the deadly virus those already affected pose a real threat to the survivors. A disaster recovery/business continuity plan was needed. Using the specifications for the datacenters being built we calculated what additional hardware and software would be needed to support any one datacenter should it be compromised. This new plan would mean adding the following to each datacenter. The premise is should any datacenter fail the surviving datacenters would shoulder the load that Site Recovery Manager would send to them while the restoration phase is activated and completed. In order to ensure that any failed datacenter could be recovered we made workload calculations based on the number of virtual servers and virtual desktops in each datacenter. With this information we added the following hardware and software to each datacenter: Hardware: Datacenter 1 10x Dell Poweredge R905 servers 2x 48 port 1gb modules for Nexus 7000 Additional bricks for the Pillar SAN to accommodate backups Datacenter 2 40x Dell Poweredge R905 servers 1x Nexus 7000 5x 48 port 1gb modules for Nexus 7000 Additional bricks for the Pillar SAN to accommodate backups Datacenter 3 10xDell Poweredge R905 servers 2x 48 port 1gb modules for Nexus 7000 Additional bricks for the Pillar SAN to accommodate backups
  • 4. Software Syncsort DPX Backup Software The Syncsort backup software modernizes data protection through highly efficient snapshot and replication technology, integration with VMware and applications, resulting in dramatically faster backup and recovery operations Oracle’s Pillar Axiom MaxRep Replication for SAN This offering from Oracle allows the Pillar SAN to replicate to its partners in the secondary and tertiary datacenters. With replication we can preserve all of our valuable data in multiple locations so that any failures, attacks, or catastrophes can be recovered from in a timely manner This new hardware and software introduces additional costs for the support team and for the organizations stakeholders. 46% increase in server and network port density (more time required for rack/install/configure) Addition of Syncsort DPX Backup software o Syncsort Integrated Backup License ($7600 per 1TB) $2,280,000 for 300TB o Exchange Mailbox Recovery (500 per license)x6 $3,000 for 3000 users o Sharepoint Object Recovery $1,900 o Data protection Reporter $4,275 o Maintenance License $17,500 Addition of Axiom MaxRep SAN replication (increase time to create/configure multi-site SAN replication) o
  • 5. Scope The scope of this DRP document addresses technical recovery in the event of a significant disruption. The intent of the DRP is to be used in conjunction with the business continuity plan (BCP) W.R.O has. A DRP is a subset of the overall recovery process contained in the BCP. Plans for the recovery of people, infrastructure, and internal and external dependencies not directly relevant to the technical recovery outlined herein are included in the Business Continuity Plan and/or the Corporate Incident Response and Incident Management plansW.R.O has in place. The link to the specific BCP document related to this DRP can be found here: LINK TO BCP This disaster recovery plan provides: - Guidelines for determining plan activation; - Technical response flow and recovery strategy; - Guidelines forrecovery procedures; - References to key Business Resumption Plans and technical dependencies; - Rollback procedures that will be implemented to return to standard operating state; - Checklists outlining considerations for escalation, incident management, and plan activation. The specific objectives of thisdisaster recovery plan are to: - Immediately mobilize a core group of leaders to assess the technical ramifications of a situation; - Set technical priorities for the recovery team during the recovery period; - Minimize the impact of the disruption to the impacted features and business groups; - Stage the restoration of operations to full processing capabilities; - Enable rollback operations once the disruption has been resolved if determined appropriate by the recovery team. - Arm all teams with ample weapons, armor and supplies to handle any Zombie issues that arise. Within the recovery procedures there are significant dependencies between and supporting technical groups within and outside W.R.O. This plan is designed to identify the steps that are expected to take to coordinate with other groups / vendors to enable their own recovery.
  • 6. 2. Dependencies This section outlines the dependencies made during the development of this disaster recovery plan. If and when needed theDR TEAM will coordinate with their partner groups as needed to enable recovery. Dependency Assumptions User Interface Users (end users, power users, administrators) are unable to access the system through any part of the instance (e.g. client or server side, web interface or downloaded application). Infrastructure and back-end services are still assumed to be active/running. Business Intelligence / Reporting The collection, logging, filtering, and delivery of reported information to end users is not functioning (with or without the user interface layer also being impacted). Standardbackup processes are not impacted, but the active / passive or mirrored processes are not functioning. Specific types of disruptions could include components that process, match and transforms information from the other layers. This includes business transaction processing, report processing and data parsing. Network Layers Connectivity to network resources is compromised and/or significant latency issues in the network exist that result in lowered performance in other layers. Assumption is that terminal connections, serially attached devices and inputs are still functional. Storage Layer Loss of SAN, local area storage, or other storage component. Database Layer Data within the data stores is compromised and is either inaccessible, corrupt, or unavailable Hardware/Host Layer Physical components are unavailable or affected by a given event Virtualizations (VM's) Virtual components are unavailable Hardware and hosting services are accessible Administration Support functions are disabled such as management services, backup services, and log transfer functions. Other services are presumed functional In addition assumptions within the Business Continuity Plan for this work stream still apply.
  • 7. 3. Disaster Recovery Strategies The overall DR strategy of W.R.O.is summarized in the table below and documented in more detail in the supporting sections. These scenarios and strategies are consistent across the technical layers (user interface, reporting, etc.) 4. Disaster Recovery Procedures A disaster recovery event can be broken out into three phases, the response, the resumption, and the restoration. These phases are also managed in parallel with any corresponding business continuity recovery procedures summarized in the business continuity plan. Data Center Disruption Failover to alternate Data Center Reroute core processes to another Data Center (without full failover) Operate at a deprecated service level Take no action Significant network or other issues Reroute operations to backup processing unit / service (load balancing, caching) Wait for service to be restored, communicate with core stakholders as needed •On call personnel paged •Decision made around recovery strategies to be taken •Full recovery team identified Response Phase: The immediate actions following a significant event. •Recovery procedures implemented •Coordination with other departments executed as needed Resumption Phase: Activities necessary to resume services after team has been notified. •Rollback procedures implemented •Operations restored Restoration Phase: Tasks taken to restore service to previous levels.
  • 8. Response Phase The following are the activities, parties and items necessary for a DR response in this phase. Please note these procedures are the same regardless of the triggering event (e.g. whether caused by a Data Center disruption or other scenario). Response Phase Recovery Procedures – All DR Event Scenarios Step Owner Duration Components Identify issue, page on call / Designated Responsible Individual (DR TEAM) DR TEAM 15 minutes Issue communicated / escalated Priority set Identify the team members needed for recovery DR TEAM 15 minutes Selection of core team members required for restoration phase from among the following groups: Operations Establish a conference line for a bridge call to coordinate next steps DR TEAM 10 minutes Primary bridge line: 555-123-9876 Secondary bridge line: 555-123-7485 Alternate / backup communication tools: email, communicator Communicate the specific recovery roles and determine which recovery strategy will be pursued. DR TEAM 60 minutes Documentation / tracking of timelines and next decisions Creation of disaster recovery event command center / “war room” as needed This information is also summarized by feature in Appendix A: Disaster Recovery Contacts - Admin Contact List.
  • 9. Resumption Phase During the resumption phase, steps will be taken to enable recovery to one or more surviving sites. VMware’s Site Recovery Manager will be at the heart of the tasks required to resume service. Data Center Recovery Full Data Center Failover Step Owner Duration Components Initiate Failover DR TEAM / SRM TBD Restoration procedures identified Risks assessed for each procedure Coordination points between groups defined Complete Failover DR TEAM / SRM TBD Recovery steps executed, including handoffs between key dependencies Test Recovery DR TEAM / SRM TBD Tests assigned and performed Results summarized and communicated to group Failover deemed successful DR TEAM TBD Signoff by DR TEAM In the event of a Full Datacenter Failover the following processes should be followed to implement failover to the surviving datacenters and to start the process of recovering the failed datacenter. Connect remotely to the failed/compromised datacenter Examine backup logs from previous night’s jobs for completion Confirm replication of SAN LUN’s to secondary and tertiary datacenters Restore required Tier 1 servers Restore additional Virtual Desktop servers to manage user workloads Gracefully shutdown servers at compromised datacenter Below is a timeline for recovery actions associated with the failover the technical components between different data centers to provide geo-redundant operations. Coordination of recovery actions is crucial. A timeline is necessary in order to manage recovery between different groups and layers.
  • 10. Reroute critical processes to alternate Data Center Step Owner Duration Components Identify and agree on failover Mr. Thropist 1 hour Meet with datacenter administration team Assess outage and options for mitigation Declare disaster Remote access to Datacenter Joe Graziano Backup Admin SAN Admin 30min Ensure safe location, either other datacenter or mobile access location Make VPN connection to datacenter Examine Backup Logs and SAN Replication Backup Admin SAN Admin 1 hour Review Syncsort logs to ensure successful backups Review SAN replication logs to ensure current replication Instant Virtualize (IV) Tier 1 Servers Joe Graziano Server Admin 8 hours Instant virtualize (IV) Tier 1 servers Full Recover Tier 1 Servers (RRP) Joe Graziano Server Admin 16 hours Kick off Rapid Return to Production on Tier 1 servers Instant Virtualize (IV) Virtual Desktop Servers Joe Graziano Virtual Desktop Admin 4 hours Instant virtualize (IV) Virtual Desktop servers Full Recover Virtual Desktop Servers (RRP) Joe Graziano Virtual Desktop Admin 8 hours Kick off Rapid Return to Production on Virtual Desktop servers Shutdown servers at failed Datacenter Joe Graziano 2 hours Power down all servers at failed datacenter Power down SAN and ESX hosts
  • 11. Restoration Phase During the restoration phase, the steps taken to enable recovery will vary based on the type of issue. The procedures for each recovery scenario are summarized below. Data Center Recovery Full Data Center Restoration Step Owner Duration Components Determine whether failback to original Data Center will be pursued DR TEAM 1 Week Define Restoration procedures Recon the failed datacenter Map out entry points Reinforce arms, weapons and supplies for raid Clear failed datacenter of Zombies Daryl Dixon Rick Grimes Glen Rhee Swat Team 5 hours Establish a perimeter Lock down escape routes Enter datacenter Take no prisoners / Kill Zombies Original data center restored DR TEAM 1 week Examine Network Examine Servers / SAN Repair/replace any damaged equipment Burn bodies Clean all building floors for repopulation Complete Failback DR TEAM 16 hours Shutdown Tier 1 server at secondary and tertiary datacenters Power up Tier 1 servers in recovered datacenter Test Failback DR TEAM TBD Tests assigned and performed Results summarized and communicated to group Issues (if any) communicated to group Determine whether failback was successful DR TEAM TBD Declaration of successful failback and communication to stakeholder group. Disaster recovery procedures closed. Results summarized, post mortem performed, and DRP updated (as needed).
  • 12. Appendix A: Disaster Recovery Contacts - Admin Contact List The critical team members who would be involved in recovery procedures for feature sets are summarized below. Feature Name Contact Lists Oracle’s Pillar Axiom MaxRep Replication for SAN SAN TEAM Syncsort DPX backup software for VMware Backup Admin Team Failed Datacenter Recovery Team SWAT Team Tier 1 Server failover and testing Server Admin Team Virtual Desktop Server failover and testing Virtual Desktop Admin Team SAN Team Joe Graziano Chris Wahl Charles Bartowski Backup Admin Team Jonathan Frapier Scott Lowe Luther Stickell SWAT Team Daryl Dixon Rick Grimes Glen Rhee Eric Wright Server Admin Team Akmal Waheed Mike Laverick Dade Murphy Virtual Desktop Admin Team Josh Atwell Angelo Luciani Emanuel Goldstein
  • 13. Appendix B: Document Maintenance Responsibilities and Revision History This section identifies the individuals and their roles and responsibilities for maintaining this Disaster Recovery Plan. Primary Disaster Recovery Plan document owner is:Joe Graziano Primary Designee: Joe Graziano Alternate Designee #1: Jonathan Frapier Alternate Designee #2: Akmal Waheed Name of Person Updating Document Date Update Description Version # Approved By Joe Graziano 8/20/2013 Added VMware SRM documentation 1.1 Phil N. Thropist Joe Graziano 8/21/2103 Added Alternate Designee’s Jonathan and Akmal 1.2 Phil N. Thropist Joe Graziano 8/22/2103 Completed final draft for release to Virtual Design Master 1.3 Phil N. Thropist
  • 14. Appendix C: Server Farm Details Datacenter 1 Diagrams Datacenter 2 Diagrams Datacenter 3 Diagrams
  • 15. Appendix D: Glossary/Terms Standard Operating State: Production state where services are functioning at standard state levels. In contrast to recovery state operating levels, this can support business functions at minimum but deprecated levels. Presentation Layer: Layer which users interact with. This typically encompasses systems that support the UI, manage rendering, and captures user interactions. User responses are parsed and system requests are passed for processing and data retrieval to the appropriate layer. Processing Layer: System layer which processes and synthesizes user input, data output, and transactional operations within an application stack. Typically this layer processes data from the other layers. Typically these services are folded into the presentation and database layer, however for intensive applications; this is usually broken out into its own layer. Database Layer:The database layer is where data typically resides in an application stack. Typically data is stored in a relational database such as SQL Server, Microsoft Access, or Oracle, but it can be stored as XML, raw data, or tables. This layer typically is optimized for data querying, processing and retrieval. Network Layer: The network layer is responsible for directing and managing traffic between physical hosts. It is typically an infrastructure layer and is usually outside the purview of most business units. This layer usually supports load balancing, geo-redundancy, and clustering. Storage Layer:This is typically an infrastructure layer and provides data storage and access. In most environments this is usually regarded as SAN or NAS storage. Hardware/Host Layer: This layer refers to the physical machines that all other layers are reliant upon. Depending on the organization, management of the physical layer can be performed by the stack owner or the purview of an infrastructure support group. Virtualization Layer: In some environments virtual machines (VM's) are used to partition/encapsulate a machine's resources to behave as separate distinct hosts. The virtualization layer refers to these virtual machines. Administrative Layer: The administrative layer encompasses the supporting technology components which provide access, administration, backups, and monitoring of the other layers.