Disaster recovery, emergency response and business continuity plans are usually developed when no disaster exists. We think we’ve covered all contingencies. We think we’ve trained all the appropriate players. We’ve tested. We’ve re-tested. We think we’re ready to face whatever event there is looming out their with our name on it! The real world has a nasty habit of triggering disasters at the least opportune time, often featuring a twist that throws plans into disarray.
This presentation focuses on three real-world plans, each of which with a fatal flaw. We will discuss elements that should be in a plan beyond the normal guidance from the Disaster Recovery Institute (DRI) and a set of actions that should be included in planning and preparation.
How to Troubleshoot Apps for the Modern Connected Worker
Harry Regan - It's Never So Bad That It Can't Get Worse
1. It’s Never So Bad
That It Can’t Get Worse
A REVIEW OF DISASTER RECOVERY AND
BUSINESS CONTINUITY PLANNING IN PRACTICE
HARRY REGAN
VP, SECURITY CONSULTING SERVICES
VALERIE THOMAS
SENIOR SECURITY TECHNOLOGIST
SECURICON, LLC
HTTP://WWW.SECURICON.COM
2. Agenda
• Who We Are
• Things DRITellsYou
• The Magic of MixingTechnology and Humans
• 3Tales from the Field
o Clouds of 9/11
o What if they threw a disaster and nobody came?
o Financial Services andY2K
• ScarTissue and Recommendations
• Conclusions and Q&A
3. Who are we?
• Securicon is a 13+ year old security consultancy in
security programs and engineering, both cyber and
physical.
• Broad base of experience in the integration of human
and social issues into the implementation and impact
on security
• Enterprise-level experience in developing COOP and
BCP plans.
4. The Magic of Mixing Technology and
Humans
• Technology makes the world work
• Humans make the world weird
• Business Continuity happens at the intersection of
people and technology– with one or more
emergencies thrown into the mix.
• Plans may be concise and logical, but human
behavior is not as predictable as we’d like.
• “When the first shot is fired, battle plans go out the
window”
-- George Patton
5. Reality…
• We’re going to examine three actual case studies
from three different industries.
• All three companies involved had a good Business
Continuity Plan
• All three had a major failure then the disaster really
arrived
6. Things DRI Tells You…
Key Objectives…
• Safety is #1 priority in a emergency/disaster
• Keep the business operating and revenue flowing
• Maintain basic communications (e-mail, phone)
• Suck it up! Don’t give customers a reason to worry
(Web site up, services available and shipping
with minimal disruptions)
• Maintain billing and accounting
7. More Things DRI TellsYou…
• Your DR/BCP plan should have strategies for…
• Emergency Response and Operations
Contingencies
• Actionable and detailed Business Continuity
Plans at a situational and granular Level
• Training and Awareness – for everyone, but
especially for key staff involved in the plan
• Maintaining andTesting DR and Business
Continuity Plans and Operability – and really do
it!
• Public Relations and Crisis Communications
• Coordination with Public Authorities
8. 3 tales from the field
• Clouds of September 11
o Hurricane Gabrielle hits Florida
• What if the threw a disaster and nobody came?
o Great plan, now where’s the staff?
• Financial Services andY2K
o Y2K Plan used for 9/11 – successfully!
9. Clouds of September 11
• September 9, 2001 –Tropical Storm Gabrielle
forms off the west coast of Florida in the Gulf of
Mexico.
• September 11, 2001 – Hurricane Gabrielle
threatens western Florida coast.
• A manufacturing company in central Florida,
already experiencing flooding in their facility and
data center from heavy rain, decides to declare a
disaster and exercise their DR contract with IBM
• Scheduled DR site – Sterling Forest, NY
• The request “could not be accommodated”
10. Clouds of September 11
• There really was no formal plan. They had backup tapes
on site. They had arranged for specific equipment at
the DR site
• The company assumed they could just “swap over” to
the DR site. Assumed they could just show up with the
tapes, but never tested
• Lessons learned
o With an untested plan, it was really iffy that they could
successfully exercise the DR plan
o With a 3rd party DR contract, you may be able to get your
money back if you “can’t be accommodated”!
o Yes, their data center flooded…
11. What if the threw a disaster
and nobody came?
• Picture rolling New England hills, nestling a quaint little
mill town. In this town is a manufacturing company
that makes specialty products for the medical industry
• “Shelter in Place” is a strategy some companies adopt–
that’s the approach this company chose – backups and
redundant equipment maintained on site.
• The data center featured a natural gas generator tied to
the city gas lines, so as long as they had fuel, they had
power
• The network featured divergent carriers with failover
• They engineered their systems to be all remotely
administered and operated so there was little need for
staff to be onsite – but functions had to be manually
attended. Robust, tested remote access processes.
12. What if the threw a disaster
and nobody came?
• In reviewing their DR/BCP documents, it struck me
that they had a a very exacting “Bob will do X,
Frank will doY” approach. Sooner or later, they
said, they’d cross train folks.
• In May of 2006, the area experienced severe
flooding. Telecommunications were out, roads
impassable, residents evacuated from the area.
• The systems were up! No one was available to do
anything with them, but they were up!
• Discovered many processes someone had to be on
site for (e.g. IT did not control the phone system or
the PACS)
13. Financial Services andY2K
• Large globally recognized financial services firm
with heavy transactional network traffic.
• Primary data center in southern New England,
about an hour out of NYC
• Backup data center 150 miles south.
• Standing hotel accommodations for operations
teams near both data centers
• Situational BCP built with input from each business
unit. Tested, tested, tested.
• Identification of positions that needed to be on-site
(the rest would work from home)
14. Financial Services andY2K
• Monthly live test of failover from primary to
backup. Well understood system and network for
financial services. Business systems were lower
priority.
• NYC staff in 1 Liberty Plaza,Times Square and on
Whitehall Street
• If staff had to be displaced, they would go to one of
several locations or be issued laptops to work from
home
• Y2K – Nothing Happened
• But then there was 9/11
15. Financial Services andY2K
• On 9/11 the first plane hit before market open– so
the decision was made not to open the market until
we knew what was really happening
• As events unfolded, activated disaster plan
o Liberty Plaza andWhitehall staff evacuated toTimes
Square (until SouthTower collapse)
o Network transferred to Backup Site without incident
• Returned to normal operation by 9/17
• Long-term displacement of workstaff
16. Financial Services andY2K
• On one level, the DR/BCP was successful.
o Almost seamless transition to backup (turned out
not to be necessary)
o Market systems staff was on-site, in place and
ready for normal operations when the disaster
occurred
o Corporate systems staff generally was in transit
or about to leave home, but in DC – another 9/11
target site
o Market systems were ready for scheduled market
open at 10AM, but decision was made to keep
the market closed.
o There were staff injuries, but no reported
fatalities
17. Financial Services andY2K
• Problems with the BCP
o No plan for loosing Manhattan
o Evacuation plan assumed navigable streets, availability of
public transportation
o Severe and lasting workforce displacement
o IT not ready for influx of teleworkers
• One element of dumb luck
o AT&T NYC Switch Center was destroyed in theWTC
collapse
o The company used MCI for telephone and network service
18. Scar Tissue and Recommendations
• Recurring drills are important. Annual drills are
simply not frequent enough. Test it, darn it!
• Still doing weekly/monthly backups with
incrementals? You should rethink your backup
strategy.
• Practice bare-metal restores. Even with great
planning and preparation, odds are good you’ll
have to do one or more and they take time.
• Transactional systems love to have journal
problems. Understand how to identify problems
early and quickly and how to resolve them.
• If you’re using a 3rd party backup site, expect
equipment problems. Plan for it.
19. Scar Tissue and Recommendations
• Understand what disasters are facing your disaster
recovery sites!
• Understand the logistics of getting the right people
to the right place in different kinds of disasters!
• See if you can arrange to have your restoration
media transmitted to the DR site.
(Throwing the backup media in the van with the DR away
team may make the disaster even worse)
• Maintain the equipment for the DR site! It won’t
help you if the DR hardware can’t run the current
mission critical applications!
20. Scar Tissue and Recommendations
• Cross train DR/BCP teams onALL roles. DRI
recommends backups roles and backups to
backups. But you won’t know for sure who reports
for duty until the disaster.
“When the first shot is fired,
battle plans go out the window.”
General George Patton
21. What this “Granular” stuff?
• It’s rare that a disaster/emergency will unfurl on
your terms. The key to survival is flexibility
o Be ready for a “half disaster”
o Also be ready for multiple, simultaneous disasters
o Finally, be ready for key staff unavailability
• Situational planning is important
o Have plans built for the most likely disaster scenarios
o To the extend possible, compartmentalize
o Also have a OCISD Strategy
OCISD = “Oh crud! It’s something different!”
22. Conclusions and Q&A
If you take nothing else away from this presentation,
remember:
#1 Test. Refine. Repeat.
#2 Be flexible. It probably won’t happen like you think it will
#3 When it does happen, you’ll find out which pieces you
didn’t test enough.