Resilience And Failure Obviation Software Engineering
1. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
A Resilience & Failure Obviation
Based Approach to Software
Safety Engineering
Donna A. Dulo
US Department of Army
25 FEB 09
SW 4936 US Naval Postgraduate School 1
2. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
The concept of failure…is central to understanding
engineering, for engineering design has as its first
and foremost objective the obviation of failure.
- Henry Petroski
Resilience is the ability of systems to prevent or
adapt to changing conditions in order to maintain
control over a system property…to ensure safety…
and to avoid failure.
- Hollnagel, Woods, & Leveson
[1] Petroski [2] Hollnagel, et al.
SW 4936 US Naval Postgraduate School 2
3. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
Two Separate Concepts of General Engineering…
Resilience Engineering
Failure Obviation Engineering
…Applied to Software Engineering
SW 4936 US Naval Postgraduate School 3
4. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
% of Functions Performed by Software
90 80
80
70 65
Percentage
60
50 45
40 35
30 20
20 8 10
10
0
F-4 A-7 F-111 F-15 F-16 B-2 F-22
(1960) (1964) (1970) (1975) (1982) (1990) (2000)
Weapon System [3] AFIT
SW 4936 US Naval Postgraduate School 4
5. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
DoD Software Success Rate
29%
46%
20%
2% 3%
Not Used Cancelled Modified Minor Changes Used As Is
[4] DoD
SW 4936 US Naval Postgraduate School 5
6. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
Resilience Engineering
• A paradigm for safety management and design which focuses on
helping organizations to cope with complexity under pressure to
achieve success
• A resilient organization treats safety as a core value, not a
commodity that can be counted
• Contrasts with current safety engineering paradigms of tabulating
error
• Invests in anticipating the changing potential for failure
• Creates foresight to anticipate the changing shape of risk before
failure occurs
[2] Hollnagel, et al.
SW 4936 US Naval Postgraduate School 6
7. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
Safety Engineering
• Focuses on systems that will execute within a specified context
without contributing to hazards
• Central concept: mathematical analysis and model based
identification of system component faults, failures, and errors
• System hazard reduction and elimination
• Methodologies:
●
Fault Trees
●
Hazard & Operability Analysis Models
●
Qualitative & Probabilistic Models
[5] Leveson
SW 4936 US Naval Postgraduate School 7
8. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
Resilience Safety
Engineering Engineering
Safe &
Organization Centric System Centric
Reliable
Safety as a Core Value Safety as a Thing
Systems
Failure Anticipation Failure Reduction
Operation
Foresight Probabilistic
Organizational Adaptability Mathematics & Analysis
SW 4936 US Naval Postgraduate School 8
9. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
Reliability Engineering
• Developing systems which reach the market at the right time, at
an acceptable cost with satisfactory reliability and availability
• Concerned primarily with the characteristics of a system expressed by
the probability that the system will perform its required function in the
specified manner in a given period of time in a specified set of conditions
• Achieving the correct balance based on customer needs of
reliability/availability, delivery time, cost, and ease of maintenance
• Quantitative characterization of expected use & quality characteristics
• Treats safety as a subset of reliability
[6] Musa
SW 4936 US Naval Postgraduate School 9
10. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
Failure Obviation Engineering
• A new term based on Petroski’s concept of failure elimination in
engineering
• A focus on failure can lead to success, as the most successful
improvements in a system are those that focus on the limitations
and failures
• A reliance on successful precedents can lead to failure.
• Success is not simply the absence of failure; it also masks
potential modes of failure
• Success and failure are intertwined
• Intensive analysis of failure case studies
[7][8] Petroski
SW 4936 US Naval Postgraduate School 10
11. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
Reliability v. Failure Intensity
Reliability 1.0
(failures/exec hr)
Failure Intensity
Reliability
Failure Intensity
Time (exec hr)
[6] Musa
SW 4936 US Naval Postgraduate School 11
12. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
Failure Reliability
Obviation Engineering
Engineering
Safe &
Failure Centric Success Centric
Reliable
Failure as Learning Success as Learning
Systems
Anti-Patterns Patterns
Operation
Case Studies Operational Profiles
Organizational & System Focus System Focus
SW 4936 US Naval Postgraduate School 12
13. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
Reliability
Engineering
Resilience Safety
Engineering Safe System
Engineering
Failure
Obviation
Engineering
SW 4936 US Naval Postgraduate School 13
14. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
Reliability
Engineering
Traditional Focus
+
Resilience Safety
Engineering Safe System
Engineering
Failure
Obviation
Engineering
SW 4936 US Naval Postgraduate School 14
15. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
Reliability
Engineering
Resilience Safety
Engineering Safe System
Engineering
+
My Focus
Failure
Obviation
Engineering
SW 4936 US Naval Postgraduate School 15
16. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
Leading & Seminal John Musa
Reliability
Researchers Engineering
Debra Hermann
David Smith
Resilience Safety
Engineering Safe System
Engineering
Nancy Leveson
Erik Hollnagel
David Woods Sheri Lawrence Pfleeger
Richard Stephans
Nancy Leveson
Failure Henry Petroski
Obviation Charles Perrow
Engineering Dietrich Dorner
SW 4936 US Naval Postgraduate School 16
17. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
My Research Methodologies
- Intensive investigations into case studies related to software based
accidents with software being leading or contributing factor
- NTSB Accident Reports
- International Accident Reports
- NASA & ESU Accident Reports
- Military Accident Reports
- Accidents & incidents investigated. Looking for failure of systems not just
high causality counts
-One or more Delphi studies
- Civilian & military experts
SW 4936 US Naval Postgraduate School 17
18. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
My Research Goals
-Investigate & discover all possible cases involving software
- Inspect thousands of reports to develop software accident
database for analysis
- Investigate beyond traditional case examples (Therac 25,
Arianne 5, Mars Polar Lander, Patriot Missile System, etc)
- Discover overlooked case studies (i.e. “Pilot” error or “System”
error really software error)
- Trend analysis and common threads
- Using above results, develop resilience model
SW 4936 US Naval Postgraduate School 18
19. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
Case Study Example #1
Air New Zealand DC-10 crash into Mt. Erebus, Antarctica 1979
255 Fatalities, Total Hull Loss
Primary Listed Cause:
- Pilot Error due to low altitude and whiteout effects
Discovered Issue:
- Navigation software programmed incorrectly, pilots
unaware of this issue
- Pilots were not where they thought they were
geographically
- Software HCI issue, Software system protocol issues
SW 4936 US Naval Postgraduate School 19
20. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
Case Study Example #2
American Airlines Flight 695 crashes into mountain in Colombia
159 Fatalities, 4 Serious Injuries, Total Hull Loss
Primary Listed Cause: Pilot Error during night flight
Discovered Issue:
-Flight management system software interpreted pilot
input wrong, turned aircraft in wrong direction
- Internal memo from Honeywell Air Transport Systems
to Jeppeson, the software manufacturer 11 months
before accident:
“It could cause a large incident if these
[software] problems in the flight support
system are left un resolved.”
SW 4936 US Naval Postgraduate School 20
21. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
SW 4936 US Naval Postgraduate School 21
22. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
Case Study Example #3
AdamAir Flight 574 1 Jan 2007 Crashed into sea
near Indonesia
102 Fatilities, Total Hull Loss
Listed Cause: Pilot Error, Spatial Disorientation
Major Contributing Cause:
Failure of Inertial Reference System
Software disengaged autopilot unbeknownst to pilots
Plane rolled right 35 degrees from software autopilot
disengagement
Pilots could not recover from roll
SW 4936 US Naval Postgraduate School 22
23. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
Potential Research Papers
“Applying Resilience Engineering to Safety Critical Software Systems”
“Failure Obviation Engineering: A New Concept in Developing Safe Software”
“Resilience and Failure Obviation Engineering: A New Paradigm for Developing
Safety Critical Software Systems”
“Current Trends in Safety Critical Software Failures”
SW 4936 US Naval Postgraduate School 23
24. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
Interesting note:
“Silver Bullet?”
We’ll see…….
SW 4936 US Naval Postgraduate School 24
25. 0010101101100110
1011000110100101
1101011100101001 Resilience & Failure Obviation Based Software Engineering
0101100011101110
References
[1] Petroski, H. (1992). To Engineer is Human. Vintage Books. New York.
[2] Hollnagel, E., Woods, D., & Leveson, N., Eds. (2006). Resilience Engineering: Concepts and
Precepts. Ashgate. Burlington, VT.
[3] Air Force Institute of Technology. (2001). VV&T Class Slides.
[4] US Dept. of Defense. (1999) Joint Warfare Application Seminar.
[5] Leveson, N. (1995) Safeware. Addison-Wesley. New York.
[6] Musa, J. (2004). Software Reliability Engineering, 2nd Ed. Author House. Bloomington, IN.
[7] Petroski, H. (2006). Success Through Failure: The Paradox of Design. Princeton Press. NJ.
[8] Petroski, H. (1992). To Engineer is Human: The Role of Failure in Successful Design. Vintage. NY.
SW 4936 US Naval Postgraduate School 25