A detailed overview of the business continuity / disaster recovery planning process. Gives numerous tips for effective execution of plan development. Emphasizes development of a true recovery capability through exercises which reveal weaknesses in the plan or technology leading to improvements.
Generative Artificial Intelligence: How generative AI works.pdf
Building a Business Continuity Capability
1. A Detailed Overview of
Business Continuity Planning
Rod Davis, CRISC, CBCP
Version 1.07
2. Formal education: BS Electrical Engineering
26 years in SIL International, 13 years in Mali, West Africa
Information Technology, Contingency Planning, Data
Recovery, and Business Continuity
IT Certifications: Security+, Network+, A+, MCSA
CRISC - Certified in Risk and Information Systems Control
CBCP - Certified Business Continuity Professional, DRII.org
3. Identify what/who they are in your organization.
Champion/facilitate information sharing.
Create forum for shared decision making.
Physical
Security
Business
Continuity
Cyber-
security
Information
Sharing
Information
Sharing
4. Defining ‘Disaster’
Disaster Recovery is a subset of Business Continuity
Business Continuity Planning helps achieve Organizational Resilience
The Business Continuity Planning Cycle
5. If a natural disaster struck a data center rendering critical IT
services unavailable?
If a terrorist attack targeted an overseas regional center?
If a pandemic threatened global operations for your mission?
6. The occurrence of some events could cause a temporary
disruption of mission-critical services.
Some scenarios could actually result in long-term loss of
mission-critical capacity.
The ‘unthinkable’ might include disruption or shutdown of
programs that these services and capacity support.
7. Disaster – an event, which causes the loss of an
essential service, or part of it, for a length of time
which imperils mission achievement.
― Andrew Hiles, Business Continuity: Best Practices
8. Disaster – An event that compromises an
organization’s ability to provide critical functions,
processes, or services for some unacceptable period of
time.
― Disaster Recovery Journal
9. Disaster Recovery Planning: The activities associated
with the continuing availability and restoration
Planning of the IT infrastructure.
― BCI Dictionary of BC Management Terms
10. 43%
51%
6%
Never reopen
Close within two years
Survive long-term
Organizations that experience major data loss without disaster recovery
plans*
* Cummings, Haag, & McCubbrey (2005). Management
Information Systems for the Information Age.
11. Business Continuity Planning is the process of developing
prior arrangements and procedures that enable an
organization to respond to an event in such a manner that
critical business functions can continue within planned levels
of disruption. The end result of the planning process is the BC
Plan.
― BCI Dictionary of BC Management Terms
12. A resilient organization is one that is able to
achieve its core objectives in the face of
adversity.
― http://www.resorgs.org.nz/
14. “It does not do to leave a live dragon out of your
calculations, if you live near him.”
― Gandalf in ‘The Hobbit’, by J.R.R. Tolkien
Two questions to ask …
• Is he alive?
• Does he live near you?
17. Natural/Environmental
Threats
• Fire
• Flood
• Hurricane
• Pandemic
• Winter storm
• Tornado
• Lightning
• Drought
• Earthquake
• Volcano
• Tsunami
Human Threats
• Fire (accidental or
arson)
• Cyber-attack
• Data theft or loss
• Terrorist attack
• Sabotage/Vandalism
• Workplace violence
• Civil unrest
• Coup d'état
• Civil war
• Chemical or
biological hazard
Infrastructure Threats
• Power grid failure
• Petroleum supply
disruption
• Food or water
contamination
• Public utility failure
(water, sewer, etc.)
• Heating/Cooling system
failure (affects IT &
people)
• Public transport
disruption
18. Threat Assessment
• Determine the most
relevant threats, e.g.
in your location
dangerous lightning
occurs frequently.
Probability
Assessment
• High frequency of
electrical storms =
high probability of
lightning strike.
Vulnerability
Assessment
• Lack of lightning
suppression = high
vulnerability to a
lightning strike.
19.
20. What is Business Impact Analysis?
Impact Rating System
Defining ‘Mission-Critical’
Recovery Point Objective
Recovery Time Objective
21. The process of analyzing business functions and
the effect that a business disruption might have
upon them.
― Business Continuity Institute
22. A process used to identify and prioritize:
Critical business functions and processes
Essential IT services and data
Required staff and equipment
23. Identify mission-critical business functions.
Which ones require the highest level of risk mitigation?
Determine impact of disruptions over time.
Establish recovery priorities in case of disruption.
24. RPO – Recovery Point Objective RTO – Recovery Time Objective
Point of last data backup Systems fully recovered
Disaster
strikes!
• RPO – Recovery Point Objective
• The maximum data loss that an organization will tolerate. Data and
systems must be restored to this point after a disruption.
• RTO – Recovery Time Objective
• The maximum period of time that an organization accepts for recovery of
business functions, systems, and processes.
DowntimeData
Timeline
25. List of mission-critical business functions and their IT
dependencies;
Recovery Time Objectives (RTOs) for these priorities
Recovery Point Objectives (RPOs) for IT assets
Recovery priorities … What do you recover first?
26. • Mission-critical business functions are those whose
sustained failure could severely impair the business or lead
to its imminent failure.
• Examples of disruptions to business functions/processes:
o Inability to meet employee payroll
o Unable to process critical bank transfers
o Critical financial data is corrupted
27. • Mission-critical refers to any network, system or
application whose sustained failure would severely disrupt
business operations.
• Examples of disruptions to technology functions include:
o Fire in a server room (destroying critical data)
o Storm causing sustained power & Internet outage
o Ransomware encrypts critical files on a data server
28. Department Managers should decide - They understand which
processes and services are most critical to their department’s successful
operation.
The Directors should decide - They know what things are critical to the
accomplishment of the overall corporate mission.
Do the IT Admins decide? Often without clear direction from
leadership, by default they decide which data and services are critical to
protect.
Point – Leadership should decide what is mission-critical, and IT staff should
implement their decisions.
29. *Recovery times shown are arbitrary and will vary greatly depending
on the type of business.
• Mission-Critical: 12 - 48 hours
• Highest priority for rapid recoverability
• Vital: 3 - 5 days
• Essential to operatons but not as critical
• Important: 1 - 4 weeks
• Long-term absence has eventual impact.
• Minor: Months
• Absence causes minimal impact.
31. Use the Risk Assessment with your Recovery
Priorities to identify risk mitigation that will
produce the greatest positive impact for the least
investment.
32. Example: Risk Assessment discovers that both an IT
system and its only backup device are stored in the same
room. The Recovery Priorities mark this as a mission-
critical system.
Solution: Move the backup device to another building on
campus.
33. Don’t attempt to identify every single business function/process, and
every single server, etc.
Instead, identify RELATED business functions and GROUP them together
into LOGICAL SYSTEMS.
Example: FINANCE SYSTEM - All the component parts of that system
have to be working for that system to function …
Servers
Processes
Staff roles
34. Your Business Impact Analysis identifies mission-critical functions
and ranks which should be recovered first.
Choose five systems which you think are among the most mission-
critical.
Chose one from that list of five systems … Develop System
Documentation and Recovery Procedures for that one system.
Apply lessons learned from this example to your remaining systems.
35. Finance Systems
Project Funding
Power Systems
Network
Data Storage and Backup Systems
36. For in-house hosted systems, do both local backup and
cloud backup.
Examples of vendors for cloud based backup:
Crashplan (Business/Enterprise), Carbonite.
For cloud-based systems (e.g., Google Drive), use cloud-to-
cloud backup
Examples of vendors for cloud-to-cloud backup:
Datto/Backupify, Spanning
37. Minimum Deliverables
Recovery Priorities
Recovery Operations Leader
Response and Recovery Teams
Guidelines for Writing Recovery Documentation
Templates for System Documentation and Recovery Procedures
38. • Mission critical systems identified and prioritized
• RPOs, RTOs established
• Vital Records, Databases, IT Services
Priorities
• Designated Roles and Responsibilities
• Contact InformationTeams
• Recovery Procedures for Mission-Critical Functions,
Processes, Systems
• Business Owners test/certify recovered systems.
Procedures
• Plan Activation: Transition Point from Emergency
Response to Plan Activation
• Declaration: Disruptive Event to Disaster
Criteria
Business Continuity Theory
39.
40. Has authority to declare an emergency or disaster
Can direct folks to stay home or move to alternate
location
Allows departmental staff and IT Department to
focus on recovery effort
41. Primary focus is on initial response to event
Ensures safety
Secures IT assets
Gives preliminary recovery time estimates
42. • Recover business function
• Activity of business ownerBusiness
• Recover IT systems
• Activity of IT
Administrators
Information
Technology
• Departmental staff
validates functionality of
services
Validation
Disaster
Recovery
Team
Focus is on
recovery
from a
disruptive
event.
43. Documentation should be developed by the system owner,
i.e., the one who by default manages that system.
In other words, don’t assign a specific individual to write all
the documentation; task the people responsible for those
systems to write it.
44.
45. The person writing the recovery procedure should write it
with the following assumptions:
The person performing the recovery is not normally
responsible for this service.
The person performing the recovery has sufficient
competence.
46. Include key
staff roles and
actual recovery
procedure.
Recovery
Procedure
for
Business
Functions
48. Do not attempt to design for the worst case scenario.
Initially focus your efforts on recovery from smaller scale events, e.g.,
Finance Systems has failed, needs to be rebuilt and tested.
Grow your planning efforts to handle more disastrous events, e.g.,
A fire has destroyed your data center, all in-house hosted systems
are down and need to be rebuilt.
50. • The goal of testing your disaster recovery plan is not to find out if it
works, but to determine where it fails.
• A planned test should never, never cause a business interruption!
o Don’t lose your data in the process of testing your data recovery plan!
o Don’t shut down a mission-critical service as a result of the test.
51. • Testing the disaster recovery plan reveals weaknesses and
also trains staff.
• As you execute the test, weaknesses are revealed.
• The staff evaluates the results of the test, and this helps staff
to ‘own’ the plan.
• Training staff helps test your plan
• As you describe the plan to your staff, they may notice
inconsistencies or weaknesses.
52. • Document Review – Validate the disaster recovery plan via stakeholder’s
review of the recovery documentation.
• Table-Top Testing – Simple walkthrough of the plan in a safe environment,
e.g., conference room.
• Advanced Table-top Exercise – Directed simulation of activating the disaster
recovery plan against a specific business disruption scenario.
• Component Test - Evaluation of a single threat event impacting a single
mission-critical function.
• Comprehensive Simulation - An exercise to evaluate overall recovery
capability in a high-stress environment.
53. Proverbs 21:5 Good planning and hard work lead to
prosperity, but hasty shortcuts lead to poverty.
“You can't plow a field simply by turning it over in your
mind.” ― Gordon B. Hinckley
54. Business Continuity Planning focuses on recovery of the broader business; Disaster
Recovery Planning focuses on recovery of its IT Infrastructure.
Both are closely related and use a nearly identical process, and realistically you cannot fully
consider one without the other.
Use the Risk Assessment with your Recovery Priorities to identify high ROI risk
mitigations.
Identify RELATED business functions and GROUP them together into LOGICAL SYSTEMS
Pick the top five business functions and/or IT systems, then pick one, complete the BC/DR
planning process for that one.
Then apply lessons learned to the other five and eventually for the remaining systems.
56. Business Continuity Institute - TheBCI.org
Six ‘Good Practice Guidelines’
Disaster Recovery Institute International -DRII.org
Ten professional practices
The International Consortium for Organizational Resilience
–TheICOR.org
Disaster Recovery Journal – drj.com
57. Dictionary of Business Continuity Management Terms:
Business Continuity Institute - BCI
http://www.thebci.org/glossary.pdf
International Glossary for Resiliency
maintained by DRI International
https://www.drii.org/glossary.php
Business Continuity Glossary by DRJ
Disaster Recovery Journal
http://www.drj.com/resources/tools/glossary-2.html
58. ISO 22301:2012 - Societal security
This has emerged as the predominant ‘gold standard’.
NIST Special Publication 800-34 Rev. 1 - Contingency Planning Guide for
Federal Information Systems
Notes de l'éditeur
The other parts of organizational resiliency are disaster recovery (part of business continuity), crisis management (protection of personnel & other assets, crisis communications w/stakeholders & press), and emergency management (protection of people during immediate stage of crisis, comm. with fire/police, etc.)
The previous lesson guided you through making a risk assessment. This lesson helps you to identify mission-critical business functions, their supporting IT services, and create meaningful metrics to measure acceptable service downtime and data loss.