SlideShare a Scribd company logo
1 of 25
0010101101100110
  1011000110100101
  1101011100101001   Resilience & Failure Obviation Based Software Engineering
  0101100011101110




       A Resilience & Failure Obviation
        Based Approach to Software
             Safety Engineering

                                 Donna A. Dulo
                            US Department of Army
                                   25 FEB 09



SW 4936                       US Naval Postgraduate School                  1
0010101101100110
  1011000110100101
  1101011100101001   Resilience & Failure Obviation Based Software Engineering
  0101100011101110




      The concept of failure…is central to understanding
      engineering, for engineering design has as its first
      and foremost objective the obviation of failure.
                                                             - Henry Petroski


       Resilience is the ability of systems to prevent or
       adapt to changing conditions in order to maintain
       control over a system property…to ensure safety…
       and to avoid failure.
                                         - Hollnagel, Woods, & Leveson

                                                                   [1] Petroski [2] Hollnagel, et al.



SW 4936                       US Naval Postgraduate School                                         2
0010101101100110
  1011000110100101
  1101011100101001   Resilience & Failure Obviation Based Software Engineering
  0101100011101110




Two Separate Concepts of General Engineering…

            Resilience Engineering

            Failure Obviation Engineering

                       …Applied to Software Engineering



SW 4936                       US Naval Postgraduate School                  3
0010101101100110
  1011000110100101
  1101011100101001            Resilience & Failure Obviation Based Software Engineering
  0101100011101110




                     % of Functions Performed by Software

                90                                                              80
                80
                70                                                     65
   Percentage




                60
                50                                           45
                40                                 35
                30                       20
                20     8        10
                10
                 0
                       F-4      A-7    F-111      F-15      F-16        B-2     F-22
                     (1960)   (1964)   (1970)    (1975)    (1982)     (1990)   (2000)
                                           Weapon System                                [3] AFIT



SW 4936                                US Naval Postgraduate School                          4
0010101101100110
  1011000110100101
  1101011100101001         Resilience & Failure Obviation Based Software Engineering
  0101100011101110




        DoD Software Success Rate
                                                               29%




                     46%
                                                                     20%
                                                2% 3%


          Not Used         Cancelled   Modified      Minor Changes         Used As Is

                                                                                        [4] DoD



SW 4936                             US Naval Postgraduate School                             5
0010101101100110
  1011000110100101
  1101011100101001    Resilience & Failure Obviation Based Software Engineering
  0101100011101110




      Resilience Engineering
           • A paradigm for safety management and design which focuses on
           helping organizations to cope with complexity under pressure to
           achieve success
           • A resilient organization treats safety as a core value, not a
           commodity that can be counted
           • Contrasts with current safety engineering paradigms of tabulating
           error
           • Invests in anticipating the changing potential for failure
           • Creates foresight to anticipate the changing shape of risk before
           failure occurs
                                                                          [2] Hollnagel, et al.



SW 4936                         US Naval Postgraduate School                               6
0010101101100110
  1011000110100101
  1101011100101001            Resilience & Failure Obviation Based Software Engineering
  0101100011101110




      Safety Engineering
           • Focuses on systems that will execute within a specified context
           without contributing to hazards
           • Central concept: mathematical analysis and model based
           identification of system component faults, failures, and errors
           • System hazard reduction and elimination
           • Methodologies:
                     ●
                         Fault Trees
                     ●
                         Hazard & Operability Analysis Models
                     ●
                         Qualitative & Probabilistic Models
                                                                             [5] Leveson



SW 4936                                US Naval Postgraduate School                        7
0010101101100110
  1011000110100101
  1101011100101001            Resilience & Failure Obviation Based Software Engineering
  0101100011101110




      Resilience                                                       Safety
      Engineering                                                      Engineering


                                              Safe &
     Organization Centric                                             System Centric
                                              Reliable
     Safety as a Core Value                                           Safety as a Thing
                                             Systems
     Failure Anticipation                                             Failure Reduction
                                             Operation
     Foresight                                                        Probabilistic
     Organizational Adaptability                                      Mathematics & Analysis




SW 4936                                US Naval Postgraduate School                            8
0010101101100110
  1011000110100101
  1101011100101001      Resilience & Failure Obviation Based Software Engineering
  0101100011101110




      Reliability Engineering

            • Developing systems which reach the market at the right time, at
            an acceptable cost with satisfactory reliability and availability
            • Concerned primarily with the characteristics of a system expressed by
            the probability that the system will perform its required function in the
            specified manner in a given period of time in a specified set of conditions
            • Achieving the correct balance based on customer needs of
            reliability/availability, delivery time, cost, and ease of maintenance
            • Quantitative characterization of expected use & quality characteristics
            • Treats safety as a subset of reliability

                                                                                 [6] Musa



SW 4936                            US Naval Postgraduate School                             9
0010101101100110
  1011000110100101
  1101011100101001    Resilience & Failure Obviation Based Software Engineering
  0101100011101110




      Failure Obviation Engineering
            • A new term based on Petroski’s concept of failure elimination in
            engineering
            • A focus on failure can lead to success, as the most successful
            improvements in a system are those that focus on the limitations
            and failures
            • A reliance on successful precedents can lead to failure.
            • Success is not simply the absence of failure; it also masks
            potential modes of failure
            • Success and failure are intertwined
            • Intensive analysis of failure case studies
                                                                         [7][8] Petroski



SW 4936                         US Naval Postgraduate School                           10
0010101101100110
  1011000110100101
  1101011100101001                                   Resilience & Failure Obviation Based Software Engineering
  0101100011101110




                                                       Reliability v. Failure Intensity


                                                                           Reliability          1.0
                                (failures/exec hr)
            Failure Intensity




                                                                                                      Reliability
                                                                            Failure Intensity



                                                             Time (exec hr)
                                                                                                                    [6] Musa



SW 4936                                                        US Naval Postgraduate School                                11
0010101101100110
  1011000110100101
  1101011100101001         Resilience & Failure Obviation Based Software Engineering
  0101100011101110




      Failure                                                        Reliability
      Obviation                                                      Engineering
      Engineering

                                            Safe &
     Failure Centric                                                Success Centric
                                            Reliable
     Failure as Learning                                            Success as Learning
                                           Systems
     Anti-Patterns                                                  Patterns
                                           Operation
     Case Studies                                                   Operational Profiles
     Organizational & System Focus                                  System Focus




SW 4936                              US Naval Postgraduate School                          12
0010101101100110
  1011000110100101
  1101011100101001      Resilience & Failure Obviation Based Software Engineering
  0101100011101110




                                     Reliability
                                    Engineering




          Resilience                                            Safety
          Engineering                   Safe System
                                                                Engineering




                                      Failure
                                     Obviation
                                    Engineering



SW 4936                          US Naval Postgraduate School                 13
0010101101100110
  1011000110100101
  1101011100101001      Resilience & Failure Obviation Based Software Engineering
  0101100011101110




                                     Reliability
                                    Engineering
                                                                Traditional Focus

                                                         +

          Resilience                                                Safety
          Engineering                   Safe System
                                                                    Engineering




                                      Failure
                                     Obviation
                                    Engineering



SW 4936                          US Naval Postgraduate School                       14
0010101101100110
  1011000110100101
  1101011100101001        Resilience & Failure Obviation Based Software Engineering
  0101100011101110




                                        Reliability
                                       Engineering




          Resilience                                               Safety
          Engineering                      Safe System
                                                                   Engineering

                                +

                     My Focus
                                         Failure
                                        Obviation
                                       Engineering



SW 4936                             US Naval Postgraduate School                 15
0010101101100110
  1011000110100101
  1101011100101001      Resilience & Failure Obviation Based Software Engineering
  0101100011101110




Leading & Seminal                                     John Musa
                                     Reliability
Researchers                         Engineering
                                                      Debra Hermann
                                                      David Smith




          Resilience                                                      Safety
          Engineering                   Safe System
                                                                          Engineering
                                                                               Nancy Leveson
   Erik Hollnagel
   David Woods                                                                 Sheri Lawrence Pfleeger
                                                                               Richard Stephans
   Nancy Leveson

                                      Failure           Henry Petroski
                                     Obviation          Charles Perrow
                                    Engineering         Dietrich Dorner




SW 4936                          US Naval Postgraduate School                                     16
0010101101100110
  1011000110100101
  1101011100101001       Resilience & Failure Obviation Based Software Engineering
  0101100011101110




    My Research Methodologies

          - Intensive investigations into case studies related to software based
          accidents with software being leading or contributing factor
                     - NTSB Accident Reports
                     - International Accident Reports
                     - NASA & ESU Accident Reports
                     - Military Accident Reports
          - Accidents & incidents investigated. Looking for failure of systems not just
          high causality counts
          -One or more Delphi studies
                     - Civilian & military experts

SW 4936                              US Naval Postgraduate School                         17
0010101101100110
  1011000110100101
  1101011100101001      Resilience & Failure Obviation Based Software Engineering
  0101100011101110




    My Research Goals

          -Investigate & discover all possible cases involving software
          - Inspect thousands of reports to develop software accident
          database for analysis
          - Investigate beyond traditional case examples (Therac 25,
          Arianne 5, Mars Polar Lander, Patriot Missile System, etc)
          - Discover overlooked case studies (i.e. “Pilot” error or “System”
          error really software error)
          - Trend analysis and common threads
          - Using above results, develop resilience model



SW 4936                           US Naval Postgraduate School                 18
0010101101100110
  1011000110100101
  1101011100101001     Resilience & Failure Obviation Based Software Engineering
  0101100011101110




Case Study Example #1

Air New Zealand DC-10 crash into Mt. Erebus, Antarctica 1979
255 Fatalities, Total Hull Loss
Primary Listed Cause:
           - Pilot Error due to low altitude and whiteout effects
Discovered Issue:
        - Navigation software programmed incorrectly, pilots
unaware of this issue
        - Pilots were not where they thought they were
geographically
           - Software HCI issue, Software system protocol issues

SW 4936                           US Naval Postgraduate School               19
0010101101100110
  1011000110100101
  1101011100101001    Resilience & Failure Obviation Based Software Engineering
  0101100011101110




Case Study Example #2
     American Airlines Flight 695 crashes into mountain in Colombia
     159 Fatalities, 4 Serious Injuries, Total Hull Loss
     Primary Listed Cause: Pilot Error during night flight
     Discovered Issue:
              -Flight management system software interpreted pilot
     input wrong, turned aircraft in wrong direction
              - Internal memo from Honeywell Air Transport Systems
              to Jeppeson, the software manufacturer 11 months
     before accident:
                         “It could cause a large incident if these
                         [software] problems in the flight support
                         system are left un resolved.”

SW 4936                          US Naval Postgraduate School               20
0010101101100110
  1011000110100101
  1101011100101001   Resilience & Failure Obviation Based Software Engineering
  0101100011101110




SW 4936                       US Naval Postgraduate School                 21
0010101101100110
  1011000110100101
  1101011100101001     Resilience & Failure Obviation Based Software Engineering
  0101100011101110




Case Study Example #3
AdamAir Flight 574 1 Jan 2007 Crashed into sea
           near Indonesia
102 Fatilities, Total Hull Loss
Listed Cause: Pilot Error, Spatial Disorientation
Major Contributing Cause:
           Failure of Inertial Reference System
           Software disengaged autopilot unbeknownst to pilots
        Plane rolled right 35 degrees from software autopilot
disengagement
           Pilots could not recover from roll

SW 4936                           US Naval Postgraduate School               22
0010101101100110
  1011000110100101
  1101011100101001   Resilience & Failure Obviation Based Software Engineering
  0101100011101110




Potential Research Papers

   “Applying Resilience Engineering to Safety Critical Software Systems”


   “Failure Obviation Engineering: A New Concept in Developing Safe Software”


   “Resilience and Failure Obviation Engineering: A New Paradigm for Developing
   Safety Critical Software Systems”


   “Current Trends in Safety Critical Software Failures”




SW 4936                        US Naval Postgraduate School                     23
0010101101100110
  1011000110100101
  1101011100101001      Resilience & Failure Obviation Based Software Engineering
  0101100011101110




    Interesting note:




                                                                “Silver Bullet?”




                                                           We’ll see…….

SW 4936                          US Naval Postgraduate School                      24
0010101101100110
  1011000110100101
  1101011100101001        Resilience & Failure Obviation Based Software Engineering
  0101100011101110




References


  [1] Petroski, H. (1992). To Engineer is Human. Vintage Books. New York.
  [2] Hollnagel, E., Woods, D., & Leveson, N., Eds. (2006). Resilience Engineering: Concepts and
  Precepts. Ashgate. Burlington, VT.
  [3] Air Force Institute of Technology. (2001). VV&T Class Slides.
  [4] US Dept. of Defense. (1999) Joint Warfare Application Seminar.
  [5] Leveson, N. (1995) Safeware. Addison-Wesley. New York.
  [6] Musa, J. (2004). Software Reliability Engineering, 2nd Ed. Author House. Bloomington, IN.
  [7] Petroski, H. (2006). Success Through Failure: The Paradox of Design. Princeton Press. NJ.
  [8] Petroski, H. (1992). To Engineer is Human: The Role of Failure in Successful Design. Vintage. NY.




SW 4936                                US Naval Postgraduate School                                       25

More Related Content

What's hot

CS 5032 L7 dependability engineering 2013
CS 5032 L7 dependability engineering 2013CS 5032 L7 dependability engineering 2013
CS 5032 L7 dependability engineering 2013Ian Sommerville
 
2012A8PS309P_AbhishekKumar_FinalReport
2012A8PS309P_AbhishekKumar_FinalReport2012A8PS309P_AbhishekKumar_FinalReport
2012A8PS309P_AbhishekKumar_FinalReportabhishekroushan
 
CS 5032 L12 security testing and dependability cases 2013
CS 5032 L12  security testing and dependability cases 2013CS 5032 L12  security testing and dependability cases 2013
CS 5032 L12 security testing and dependability cases 2013Ian Sommerville
 
CS 5032 L4 requirements engineering 2013
CS 5032 L4 requirements engineering 2013CS 5032 L4 requirements engineering 2013
CS 5032 L4 requirements engineering 2013Ian Sommerville
 
CS 5032 L1 critical socio-technical systems 2013
CS 5032 L1 critical socio-technical systems 2013CS 5032 L1 critical socio-technical systems 2013
CS 5032 L1 critical socio-technical systems 2013Ian Sommerville
 
Computers 09
Computers 09Computers 09
Computers 09AkiTenshi
 
Computers assesment
Computers assesmentComputers assesment
Computers assesmentDom9533
 
Computers Health and Safety
Computers Health and SafetyComputers Health and Safety
Computers Health and SafetyWildOakForrest
 
Computers 09
Computers 09Computers 09
Computers 09j45a45ck
 

What's hot (12)

CS 5032 L7 dependability engineering 2013
CS 5032 L7 dependability engineering 2013CS 5032 L7 dependability engineering 2013
CS 5032 L7 dependability engineering 2013
 
2012A8PS309P_AbhishekKumar_FinalReport
2012A8PS309P_AbhishekKumar_FinalReport2012A8PS309P_AbhishekKumar_FinalReport
2012A8PS309P_AbhishekKumar_FinalReport
 
CS 5032 L12 security testing and dependability cases 2013
CS 5032 L12  security testing and dependability cases 2013CS 5032 L12  security testing and dependability cases 2013
CS 5032 L12 security testing and dependability cases 2013
 
CS 5032 L4 requirements engineering 2013
CS 5032 L4 requirements engineering 2013CS 5032 L4 requirements engineering 2013
CS 5032 L4 requirements engineering 2013
 
Presentation
PresentationPresentation
Presentation
 
CS 5032 L1 critical socio-technical systems 2013
CS 5032 L1 critical socio-technical systems 2013CS 5032 L1 critical socio-technical systems 2013
CS 5032 L1 critical socio-technical systems 2013
 
Computers 09
Computers 09Computers 09
Computers 09
 
Ch12 safety engineering
Ch12 safety engineeringCh12 safety engineering
Ch12 safety engineering
 
Computers assesment
Computers assesmentComputers assesment
Computers assesment
 
Computers Health and Safety
Computers Health and SafetyComputers Health and Safety
Computers Health and Safety
 
Computers 09
Computers 09Computers 09
Computers 09
 
Computers 09
Computers 09Computers 09
Computers 09
 

Similar to Resilience And Failure Obviation Software Engineering

Understand Reliability Engineering, Scope, Use case, Methods, Training
Understand Reliability Engineering, Scope, Use case, Methods, TrainingUnderstand Reliability Engineering, Scope, Use case, Methods, Training
Understand Reliability Engineering, Scope, Use case, Methods, TrainingBryan Len
 
Probabilistic design for reliability (pdfr) in electronics part1of2
Probabilistic design for reliability (pdfr) in electronics part1of2Probabilistic design for reliability (pdfr) in electronics part1of2
Probabilistic design for reliability (pdfr) in electronics part1of2ASQ Reliability Division
 
TECHNICAL REPORTCMUSEI-99-TR-017ESC-TR-99-017Operat.docx
TECHNICAL REPORTCMUSEI-99-TR-017ESC-TR-99-017Operat.docxTECHNICAL REPORTCMUSEI-99-TR-017ESC-TR-99-017Operat.docx
TECHNICAL REPORTCMUSEI-99-TR-017ESC-TR-99-017Operat.docxmattinsonjanel
 
Four things that are almost guaranteed to reduce the reliability of a softwa...
Four things that are almost guaranteed to reduce the reliability of a softwa...Four things that are almost guaranteed to reduce the reliability of a softwa...
Four things that are almost guaranteed to reduce the reliability of a softwa...Ann Marie Neufelder
 
Four things that are almost guaranteed to reduce the reliability of a softwa...
Four things that are almost guaranteed to reduce the reliability of a softwa...Four things that are almost guaranteed to reduce the reliability of a softwa...
Four things that are almost guaranteed to reduce the reliability of a softwa...Ann Marie Neufelder
 
DEVELOPMENT AND EVALUATION OF A CONTACT CENTER APPLICATION SYSTEM TO INTEGRAT...
DEVELOPMENT AND EVALUATION OF A CONTACT CENTER APPLICATION SYSTEM TO INTEGRAT...DEVELOPMENT AND EVALUATION OF A CONTACT CENTER APPLICATION SYSTEM TO INTEGRAT...
DEVELOPMENT AND EVALUATION OF A CONTACT CENTER APPLICATION SYSTEM TO INTEGRAT...IJCNCJournal
 
Strategic Maintenance Brozine_8_6_2008
Strategic Maintenance Brozine_8_6_2008Strategic Maintenance Brozine_8_6_2008
Strategic Maintenance Brozine_8_6_2008Kevin Oswald
 
How the CC Harmonizes with Secure Software Development Lifecycle
How the CC Harmonizes with Secure Software Development LifecycleHow the CC Harmonizes with Secure Software Development Lifecycle
How the CC Harmonizes with Secure Software Development LifecycleSeungjoo Kim
 
A Tale of CI Build Failures: an Open Source and a Financial Organization Pers...
A Tale of CI Build Failures: an Open Source and a Financial Organization Pers...A Tale of CI Build Failures: an Open Source and a Financial Organization Pers...
A Tale of CI Build Failures: an Open Source and a Financial Organization Pers...Sebastiano Panichella
 
Creating and managing test environments best practices for test infrastructur...
Creating and managing test environments best practices for test infrastructur...Creating and managing test environments best practices for test infrastructur...
Creating and managing test environments best practices for test infrastructur...Knoldus Inc.
 
ISSRE 2008 Trip Report
ISSRE 2008 Trip ReportISSRE 2008 Trip Report
ISSRE 2008 Trip ReportBob Binder
 
11th Website Security Statistics -- Presentation Slides (Q1 2011)
11th Website Security Statistics -- Presentation Slides (Q1 2011)11th Website Security Statistics -- Presentation Slides (Q1 2011)
11th Website Security Statistics -- Presentation Slides (Q1 2011)Jeremiah Grossman
 
Feature Analysis of Estimated Causes of Failures in Medical Device Software a...
Feature Analysis of Estimated Causes of Failures in Medical Device Software a...Feature Analysis of Estimated Causes of Failures in Medical Device Software a...
Feature Analysis of Estimated Causes of Failures in Medical Device Software a...Yoshio SAKAI
 
Zen and the art of safety engineering
Zen and the art of safety engineeringZen and the art of safety engineering
Zen and the art of safety engineeringEric Verhulst
 
FMEA: The Good, The Bad, and The Ugly
FMEA: The Good, The Bad, and The UglyFMEA: The Good, The Bad, and The Ugly
FMEA: The Good, The Bad, and The UglyCheryl Tulkoff
 
Past and future of integrity based attacks in ics environments
Past and future of integrity based attacks in ics environmentsPast and future of integrity based attacks in ics environments
Past and future of integrity based attacks in ics environmentsJoe Slowik
 
Design approach for fault
Design approach for faultDesign approach for fault
Design approach for faultVLSICS Design
 
Reliability Engineering and Terotechnology
Reliability Engineering and TerotechnologyReliability Engineering and Terotechnology
Reliability Engineering and TerotechnologyChristian Enoval
 

Similar to Resilience And Failure Obviation Software Engineering (20)

Understand Reliability Engineering, Scope, Use case, Methods, Training
Understand Reliability Engineering, Scope, Use case, Methods, TrainingUnderstand Reliability Engineering, Scope, Use case, Methods, Training
Understand Reliability Engineering, Scope, Use case, Methods, Training
 
Introdution to POF reliability methods
Introdution to POF reliability methodsIntrodution to POF reliability methods
Introdution to POF reliability methods
 
Probabilistic design for reliability (pdfr) in electronics part1of2
Probabilistic design for reliability (pdfr) in electronics part1of2Probabilistic design for reliability (pdfr) in electronics part1of2
Probabilistic design for reliability (pdfr) in electronics part1of2
 
TECHNICAL REPORTCMUSEI-99-TR-017ESC-TR-99-017Operat.docx
TECHNICAL REPORTCMUSEI-99-TR-017ESC-TR-99-017Operat.docxTECHNICAL REPORTCMUSEI-99-TR-017ESC-TR-99-017Operat.docx
TECHNICAL REPORTCMUSEI-99-TR-017ESC-TR-99-017Operat.docx
 
Four things that are almost guaranteed to reduce the reliability of a softwa...
Four things that are almost guaranteed to reduce the reliability of a softwa...Four things that are almost guaranteed to reduce the reliability of a softwa...
Four things that are almost guaranteed to reduce the reliability of a softwa...
 
Four things that are almost guaranteed to reduce the reliability of a softwa...
Four things that are almost guaranteed to reduce the reliability of a softwa...Four things that are almost guaranteed to reduce the reliability of a softwa...
Four things that are almost guaranteed to reduce the reliability of a softwa...
 
DEVELOPMENT AND EVALUATION OF A CONTACT CENTER APPLICATION SYSTEM TO INTEGRAT...
DEVELOPMENT AND EVALUATION OF A CONTACT CENTER APPLICATION SYSTEM TO INTEGRAT...DEVELOPMENT AND EVALUATION OF A CONTACT CENTER APPLICATION SYSTEM TO INTEGRAT...
DEVELOPMENT AND EVALUATION OF A CONTACT CENTER APPLICATION SYSTEM TO INTEGRAT...
 
Strategic Maintenance Brozine_8_6_2008
Strategic Maintenance Brozine_8_6_2008Strategic Maintenance Brozine_8_6_2008
Strategic Maintenance Brozine_8_6_2008
 
How the CC Harmonizes with Secure Software Development Lifecycle
How the CC Harmonizes with Secure Software Development LifecycleHow the CC Harmonizes with Secure Software Development Lifecycle
How the CC Harmonizes with Secure Software Development Lifecycle
 
A Tale of CI Build Failures: an Open Source and a Financial Organization Pers...
A Tale of CI Build Failures: an Open Source and a Financial Organization Pers...A Tale of CI Build Failures: an Open Source and a Financial Organization Pers...
A Tale of CI Build Failures: an Open Source and a Financial Organization Pers...
 
Reliability Engineering
Reliability EngineeringReliability Engineering
Reliability Engineering
 
Creating and managing test environments best practices for test infrastructur...
Creating and managing test environments best practices for test infrastructur...Creating and managing test environments best practices for test infrastructur...
Creating and managing test environments best practices for test infrastructur...
 
ISSRE 2008 Trip Report
ISSRE 2008 Trip ReportISSRE 2008 Trip Report
ISSRE 2008 Trip Report
 
11th Website Security Statistics -- Presentation Slides (Q1 2011)
11th Website Security Statistics -- Presentation Slides (Q1 2011)11th Website Security Statistics -- Presentation Slides (Q1 2011)
11th Website Security Statistics -- Presentation Slides (Q1 2011)
 
Feature Analysis of Estimated Causes of Failures in Medical Device Software a...
Feature Analysis of Estimated Causes of Failures in Medical Device Software a...Feature Analysis of Estimated Causes of Failures in Medical Device Software a...
Feature Analysis of Estimated Causes of Failures in Medical Device Software a...
 
Zen and the art of safety engineering
Zen and the art of safety engineeringZen and the art of safety engineering
Zen and the art of safety engineering
 
FMEA: The Good, The Bad, and The Ugly
FMEA: The Good, The Bad, and The UglyFMEA: The Good, The Bad, and The Ugly
FMEA: The Good, The Bad, and The Ugly
 
Past and future of integrity based attacks in ics environments
Past and future of integrity based attacks in ics environmentsPast and future of integrity based attacks in ics environments
Past and future of integrity based attacks in ics environments
 
Design approach for fault
Design approach for faultDesign approach for fault
Design approach for fault
 
Reliability Engineering and Terotechnology
Reliability Engineering and TerotechnologyReliability Engineering and Terotechnology
Reliability Engineering and Terotechnology
 

Resilience And Failure Obviation Software Engineering

  • 1. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 A Resilience & Failure Obviation Based Approach to Software Safety Engineering Donna A. Dulo US Department of Army 25 FEB 09 SW 4936 US Naval Postgraduate School 1
  • 2. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 The concept of failure…is central to understanding engineering, for engineering design has as its first and foremost objective the obviation of failure. - Henry Petroski Resilience is the ability of systems to prevent or adapt to changing conditions in order to maintain control over a system property…to ensure safety… and to avoid failure. - Hollnagel, Woods, & Leveson [1] Petroski [2] Hollnagel, et al. SW 4936 US Naval Postgraduate School 2
  • 3. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 Two Separate Concepts of General Engineering… Resilience Engineering Failure Obviation Engineering …Applied to Software Engineering SW 4936 US Naval Postgraduate School 3
  • 4. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 % of Functions Performed by Software 90 80 80 70 65 Percentage 60 50 45 40 35 30 20 20 8 10 10 0 F-4 A-7 F-111 F-15 F-16 B-2 F-22 (1960) (1964) (1970) (1975) (1982) (1990) (2000) Weapon System [3] AFIT SW 4936 US Naval Postgraduate School 4
  • 5. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 DoD Software Success Rate 29% 46% 20% 2% 3% Not Used Cancelled Modified Minor Changes Used As Is [4] DoD SW 4936 US Naval Postgraduate School 5
  • 6. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 Resilience Engineering • A paradigm for safety management and design which focuses on helping organizations to cope with complexity under pressure to achieve success • A resilient organization treats safety as a core value, not a commodity that can be counted • Contrasts with current safety engineering paradigms of tabulating error • Invests in anticipating the changing potential for failure • Creates foresight to anticipate the changing shape of risk before failure occurs [2] Hollnagel, et al. SW 4936 US Naval Postgraduate School 6
  • 7. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 Safety Engineering • Focuses on systems that will execute within a specified context without contributing to hazards • Central concept: mathematical analysis and model based identification of system component faults, failures, and errors • System hazard reduction and elimination • Methodologies: ● Fault Trees ● Hazard & Operability Analysis Models ● Qualitative & Probabilistic Models [5] Leveson SW 4936 US Naval Postgraduate School 7
  • 8. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 Resilience Safety Engineering Engineering Safe & Organization Centric System Centric Reliable Safety as a Core Value Safety as a Thing Systems Failure Anticipation Failure Reduction Operation Foresight Probabilistic Organizational Adaptability Mathematics & Analysis SW 4936 US Naval Postgraduate School 8
  • 9. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 Reliability Engineering • Developing systems which reach the market at the right time, at an acceptable cost with satisfactory reliability and availability • Concerned primarily with the characteristics of a system expressed by the probability that the system will perform its required function in the specified manner in a given period of time in a specified set of conditions • Achieving the correct balance based on customer needs of reliability/availability, delivery time, cost, and ease of maintenance • Quantitative characterization of expected use & quality characteristics • Treats safety as a subset of reliability [6] Musa SW 4936 US Naval Postgraduate School 9
  • 10. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 Failure Obviation Engineering • A new term based on Petroski’s concept of failure elimination in engineering • A focus on failure can lead to success, as the most successful improvements in a system are those that focus on the limitations and failures • A reliance on successful precedents can lead to failure. • Success is not simply the absence of failure; it also masks potential modes of failure • Success and failure are intertwined • Intensive analysis of failure case studies [7][8] Petroski SW 4936 US Naval Postgraduate School 10
  • 11. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 Reliability v. Failure Intensity Reliability 1.0 (failures/exec hr) Failure Intensity Reliability Failure Intensity Time (exec hr) [6] Musa SW 4936 US Naval Postgraduate School 11
  • 12. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 Failure Reliability Obviation Engineering Engineering Safe & Failure Centric Success Centric Reliable Failure as Learning Success as Learning Systems Anti-Patterns Patterns Operation Case Studies Operational Profiles Organizational & System Focus System Focus SW 4936 US Naval Postgraduate School 12
  • 13. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 Reliability Engineering Resilience Safety Engineering Safe System Engineering Failure Obviation Engineering SW 4936 US Naval Postgraduate School 13
  • 14. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 Reliability Engineering Traditional Focus + Resilience Safety Engineering Safe System Engineering Failure Obviation Engineering SW 4936 US Naval Postgraduate School 14
  • 15. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 Reliability Engineering Resilience Safety Engineering Safe System Engineering + My Focus Failure Obviation Engineering SW 4936 US Naval Postgraduate School 15
  • 16. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 Leading & Seminal John Musa Reliability Researchers Engineering Debra Hermann David Smith Resilience Safety Engineering Safe System Engineering Nancy Leveson Erik Hollnagel David Woods Sheri Lawrence Pfleeger Richard Stephans Nancy Leveson Failure Henry Petroski Obviation Charles Perrow Engineering Dietrich Dorner SW 4936 US Naval Postgraduate School 16
  • 17. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 My Research Methodologies - Intensive investigations into case studies related to software based accidents with software being leading or contributing factor - NTSB Accident Reports - International Accident Reports - NASA & ESU Accident Reports - Military Accident Reports - Accidents & incidents investigated. Looking for failure of systems not just high causality counts -One or more Delphi studies - Civilian & military experts SW 4936 US Naval Postgraduate School 17
  • 18. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 My Research Goals -Investigate & discover all possible cases involving software - Inspect thousands of reports to develop software accident database for analysis - Investigate beyond traditional case examples (Therac 25, Arianne 5, Mars Polar Lander, Patriot Missile System, etc) - Discover overlooked case studies (i.e. “Pilot” error or “System” error really software error) - Trend analysis and common threads - Using above results, develop resilience model SW 4936 US Naval Postgraduate School 18
  • 19. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 Case Study Example #1 Air New Zealand DC-10 crash into Mt. Erebus, Antarctica 1979 255 Fatalities, Total Hull Loss Primary Listed Cause: - Pilot Error due to low altitude and whiteout effects Discovered Issue: - Navigation software programmed incorrectly, pilots unaware of this issue - Pilots were not where they thought they were geographically - Software HCI issue, Software system protocol issues SW 4936 US Naval Postgraduate School 19
  • 20. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 Case Study Example #2 American Airlines Flight 695 crashes into mountain in Colombia 159 Fatalities, 4 Serious Injuries, Total Hull Loss Primary Listed Cause: Pilot Error during night flight Discovered Issue: -Flight management system software interpreted pilot input wrong, turned aircraft in wrong direction - Internal memo from Honeywell Air Transport Systems to Jeppeson, the software manufacturer 11 months before accident: “It could cause a large incident if these [software] problems in the flight support system are left un resolved.” SW 4936 US Naval Postgraduate School 20
  • 21. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 SW 4936 US Naval Postgraduate School 21
  • 22. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 Case Study Example #3 AdamAir Flight 574 1 Jan 2007 Crashed into sea near Indonesia 102 Fatilities, Total Hull Loss Listed Cause: Pilot Error, Spatial Disorientation Major Contributing Cause: Failure of Inertial Reference System Software disengaged autopilot unbeknownst to pilots Plane rolled right 35 degrees from software autopilot disengagement Pilots could not recover from roll SW 4936 US Naval Postgraduate School 22
  • 23. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 Potential Research Papers “Applying Resilience Engineering to Safety Critical Software Systems” “Failure Obviation Engineering: A New Concept in Developing Safe Software” “Resilience and Failure Obviation Engineering: A New Paradigm for Developing Safety Critical Software Systems” “Current Trends in Safety Critical Software Failures” SW 4936 US Naval Postgraduate School 23
  • 24. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 Interesting note: “Silver Bullet?” We’ll see……. SW 4936 US Naval Postgraduate School 24
  • 25. 0010101101100110 1011000110100101 1101011100101001 Resilience & Failure Obviation Based Software Engineering 0101100011101110 References [1] Petroski, H. (1992). To Engineer is Human. Vintage Books. New York. [2] Hollnagel, E., Woods, D., & Leveson, N., Eds. (2006). Resilience Engineering: Concepts and Precepts. Ashgate. Burlington, VT. [3] Air Force Institute of Technology. (2001). VV&T Class Slides. [4] US Dept. of Defense. (1999) Joint Warfare Application Seminar. [5] Leveson, N. (1995) Safeware. Addison-Wesley. New York. [6] Musa, J. (2004). Software Reliability Engineering, 2nd Ed. Author House. Bloomington, IN. [7] Petroski, H. (2006). Success Through Failure: The Paradox of Design. Princeton Press. NJ. [8] Petroski, H. (1992). To Engineer is Human: The Role of Failure in Successful Design. Vintage. NY. SW 4936 US Naval Postgraduate School 25