Managing data to improve disaster recovery preparedness » data center knowledge
Improve your it disaster recovery plan, and your ability to recover from disaster
1. Print Document http://my.gartner.com/portal/server.pt/gateway/PTARGS_0_24...
This research note is restricted to the personal use of Aristotle Castro (accastro@gwu.edu).
Improve Your IT Disaster Recovery Plan,
and Your Ability to Recover From Disaster
4 June 2012 | ID:G00234709
Kevin Knox
Many organizations have inconsistent IT disaster recovery plans that vary in quality, scope
and detail. We help disaster recovery and business continuity planners improve their IT
disaster recovery plans, and their ability to recover from disaster, by outlining best
practices for key problems.
Overview
Explore related content:
"SMB Context: 'Improve Your IT Disaster Recovery Plan, and Your Ability to Recover From
Disaster.'" (17 September 2012)
Key Challenges
Minor discrepancies, omissions and oversights in an organization's disaster recovery
plan can have a major impact on the time required to recover from a disaster and
the associated business impact.
While most organizations claim to have some form of IT disaster recovery plan in
place, there are wide-ranging differences in quality, scope and detail level from one
plan to another.
Respondents to the 2011 Gartner Risk Management Disciplines Survey were asked
which types of disasters their organizations planned for. IT outage was ranked
highest among the 13 categories, with 66% of respondents stating that they plan
for IT outages.
Recommendations
Organizations should focus their disaster recovery plans specifically on the recovery
of IT services, and should clearly define the intended use and scope of the plan as a
critical first step.
Two to three senior executives in the organization should be authorized to make a
disaster declaration, and only after specific criteria have been met to qualify the
event as a disaster.
Organizations should include the details of ongoing recovery operations and failback
processes and procedures as highlighted sections in the disaster recovery plan.
Analysis
IT organizations spend considerable time and money developing and managing IT disaster
recovery plans they hope will reduce downtime and minimize the business impact when a
disaster arises. Although most large organizations claim to have some form of IT disaster
1 of 7 9/23/12 4:09 PM
2. Print Document http://my.gartner.com/portal/server.pt/gateway/PTARGS_0_24...
recovery plan in place — based on the numerous plan reviews Gartner performs each year
— there are significant differences in quality, scope and detail level from one plan to
another. Disaster recovery plans should be specific enough to address the individual
recovery requirements, technologies and processes of an organization. Although no two
plans are exactly alike, there are certain issues all organizations should consider and
missteps to avoid when developing their plans.
Having a focused, detailed and well-organized disaster recovery plan can mean the
difference between smooth recovery operations and chaos during a disaster. This research
looks at common mistakes organizations make within their IT disaster recovery plans, and
provides recommendations for improvement.
Define the Scope of the Plan
A common mistake organizations make when developing disaster recovery plans is not
limiting their scope exclusively to the recovery of IT services. For example, some
organizations include general business continuity requirements, which typically fall outside
the purview of IT. Despite IT service recovery being a key part of overall business
continuity, each department should have its own plan, coordinated at a high level, but
managed and owned separately.
Organizations should focus disaster recovery plans specifically on the recovery of IT
services, and should clearly define the intended use and scope of the plan as a critical first
step. This includes developing a concise statement about what's included and what's not,
who the intended audience is and how the document should be used. The scope also
should identify the specific locations, businesses, companies and functions covered by the
recovery plan.
Note: Business continuity management (BCM) ensures business resilience before, during
and after an operational disruption. BCM includes supplier management, crisis
management, emergency management, IT disaster recovery management (IT DRM),
business recovery, contingency planning and preparedness.
Identify Key Terminology
Most disaster recovery plans reviewed by Gartner fail to include a formal glossary of key
terminology and language. Because most recovery plans must address a wide variety of
individuals with varying levels of knowledge from multiple internal and external
organizations, an advanced understanding of language or terminology cannot be assumed.
A well-defined and easily accessible glossary of key terms and phrases should be included
in all disaster recovery plans. Establishing early in the recovery document a common
language and terminology — including industry-specific terms, recovery terminology,
commonly used acronyms, location and facility names, and abbreviations — helps
minimize misinterpretations and potential mistakes.
Make the Plan Easy to Use
Although it may seem a basic point, one constant with good disaster recovery plans is that
they are well-organized, easily navigated and easy to use. Organizations often structure
their recovery plans as novels instead of reference documents. Disaster recovery plans are
rarely read from front to back, and are most likely to be used during a crisis, not as leisure
reading beforehand.
To improve effectiveness and ease of use, organizations should separate their disaster
recovery plans into multiple, stand-alone sections or subdocuments. For example, a
recovery planning section covers items such as methodologies, management and program
goals, while a recovery operations section focuses on recovery processes and procedures.
Target each section to the specific audience or individual role, and format and organize the
2 of 7 9/23/12 4:09 PM
3. Print Document http://my.gartner.com/portal/server.pt/gateway/PTARGS_0_24...
plan for the targeted user and by content (see Table 1).
Table 1. Recovery Planning and Recovery Operation:
Document Differences
Item Recovery Planning Recovery Operations
Target IT leaders IT operations
Formatting Paragraphs and sections Bulleted lists
Order Varied Sequential
Writing Detailed Straightforward and concise
Indexed Not important Highly important
Knowledge assumption High Low
Source: Gartner (June 2012)
Reference Roles, Not Individuals' Names
Having an accurate and up-to-date recovery plan is critical for success. Unfortunately, it is
not uncommon for recovery plans to be out of date. Organizations typically do not update
their plans frequently enough to keep pace with the rate of personnel changes associated
with the individuals who are assigned recovery responsibilities. This opens the door for
tasks to be assigned to people who are no longer in the required role, have left the
company or have changed their contact information.
Avoid the use of individuals' names and contact information in the recovery document, and
use roles and job titles instead. References to roles and job titles can be indexed against
an appendix of individual names and contact information. This way, only the appendix
needs to be updated on a regular basis, and can be achieved automatically via standard
HR reports.
Address Ongoing Recovery and Failback, as Well as Failover
Most disaster recovery plans Gartner reviews focus almost exclusively on failover
processes and procedures. These plans usually fail to include adequate levels of detail, if
any details are addressed at all, on what should happen in operations after a disaster
failover occurs, or on re-establishing production operations via failback.
Ongoing recovery operations and failback procedures are almost as important as failover,
and should be covered in detail in all disaster recovery plans. Organizations should ensure
that disaster postmortem processes are established to understand the root cause of the
disaster and how it impacted IT, and to assess recovery performance.
Consider the Types of Disasters to Plan For
What types of disasters should organizations planned for? Two common approaches to
answering this question are:
One size fits all — where all types of disaster scenarios are treated the same
Individual subplans to address a wide array of potential disaster scenarios
While there is no right answer, many recovery plans we review are overly general or too
comprehensive and complex.
Organizations should plan for disaster scenarios based on their ability to manage and
3 of 7 9/23/12 4:09 PM
4. Print Document http://my.gartner.com/portal/server.pt/gateway/PTARGS_0_24...
benefit from including the various scenarios. Scenarios based on criteria such as
notification time (e.g., a tornado warning is in effect starting tomorrow at 12 noon), type
of disaster and potential business impact should be established only if material differences
exist in the way the type of disaster is managed. Organizations should avoid planning for
disasters that are highly unlikely to occur (e.g., a blizzard in the Caribbean).
Figure 1 shows 2011 Gartner Risk Management Disciplines Survey respondents' answers to
the question, "What disaster scenarios does your organization plan for in its business
continuity management efforts?"
Figure 1. Common Disasters Organizations Plan for in BCM Efforts
N = 159
Source: Gartner (June 2012)
Maintain Version and Configuration Control
Maintaining consistency between production and recovery environments remains one of
the biggest disaster recovery testing and exercising challenges organizations face. While
configuration and asset management tools can help, few organizations use them or other
tools as part of ongoing disaster recovery plan updates.
Establish formal processes via the use of management tools and libraries, or manually, to
4 of 7 9/23/12 4:09 PM
5. Print Document http://my.gartner.com/portal/server.pt/gateway/PTARGS_0_24...
ensure that all hardware and software references in a disaster recovery plan are up to
date, and represent actual production and recovery configurations. Specific version and
patch-level details should be included for all hardware, software and OSs, and these should
be updated on a regular basis. For example, it is insufficient to state Windows 2000 in the
recovery plan for a server running Windows 2000 Advanced Server Service Pack 4.
Codify What Constitutes a Disaster
Defining what qualifies as a disaster and how it is declared are key considerations not
covered by most recovery plans in adequate detail or focus. Yet, this is especially
important, given the cost and potential level of disruption associated with declaring a
disaster.
Organizations must ensure that processes and safeguards are established and documented
within the disaster recovery plan to protect against mistaken declarations. Two to three
senior executives should be authorized to declare a disaster, and this should occur only
after specific criteria have been met to qualify the event as a disaster. Similar processes
and criteria should be established to declare the end of a disaster, and to initiate failback
procedures.
Include Testing in the Disaster Recovery Plan
Disaster recovery testing is challenging and expensive, but is a critical component of
disaster recovery preparedness. Given the time and money spent on disaster recovery
testing, it is surprising we don't see it called out more regularly or covered in enough detail
within disaster recovery plans.
Testing should be a highlighted section of all disaster recovery plans, and should include
specific details, such as when it is scheduled throughout the year, what types of tests are
planned, which applications or business functions will be tested, and what testing
processes and procedures should be followed. Besides physical recovery testing,
organizations should establish a regular "paper test" schedule of when major reviews and
walk-throughs of the recovery plan occur (see "Best Practices for Planning and Managing
Disaster Recovery Testing").
Consider the Communication Infrastructure
The communication infrastructure is a top recovery priority for many organizations.
However, since it is not necessarily seen as an application or a business service, it is not
always called out or prioritized appropriately within disaster recovery plans.
The communication infrastructure should be considered a high-priority recovery function,
and treated similarly to other mission-critical business services. This is especially important
when business continuity functions such as an emergency response system might depend
on the availability of the communication infrastructure for operation. Even for execution of
the recovery plan, primary and alternative communication methods should be established
and documented.
Recommended Reading
Some documents may not be available as part of your current Gartner subscription.
"Best Practices for Planning and Managing Disaster Recovery Testing"
"Ten Best Practices for Creating and Maintaining Effective Business Continuity Management
Plans"
"Define, Develop and Verify Plans for Application Availability and Recoverability"
"Recent IT Outages Beg the Question: Who's Minding the Data?"
5 of 7 9/23/12 4:09 PM