SlideShare une entreprise Scribd logo
1  sur  26
Systems failure – a socio-
                  technical perspective




Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012   Slide 1
Complex software systems
   •      Multi-purpose. Organisational systems that support
          different functions within an organisation
   •      System of systems. Usually distributed and normally
          constructed by integrating existing
          systems/components/services
   •      Unlimited. Not subject to limitations derived from the
          laws of physics (so, no natural constraints on their
          size)
   •      Data intensive. System data orders of magnitude
          larger than code; long-lifetime data
   •      Dynamic. Changing quickly in response to changes
          in the business environment
Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012   Slide 2
Systems of systems
                                                                       •   Operational
                                                                           independence
                                                                       •   Managerial
                                                                           independence
                                                                       •   Multiple
                                                                           stakeholder
                                                                           viewpoints
                                                                       •   Evolutionary
                                                                           development
                                                                       •   Emergent
                                                                           behaviour
                                                                       •
Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012                      Slide 3
                                                                           Geographic
Complex system realities
  •       There is no definitive specification of what the system
          should ‘do’ and it is practically impossible to create
          such a specification
  •       The complexity of the system is such that it is not
          ‘understandable’ as a whole
  •       It is likely that, at all times, some parts of the system
          will not be fully operational
  •       Actors responsible for different parts of the system
          are likely to have conflicting goals


Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012   Slide 4
System failure




Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012   Slide 5
System dependability model

    System fault                                              System error

    A system                                                  An erroneous system
    characteristic that                                       state that can (but need
    can (but need not)                                        not) lead to a system
    lead to a system                                          failure
    error
                                                        System failure

                                                              Externally-
                                                              observed, unexpected
                                                              and undesirable system
                                                              behaviour



Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012                     Slide 6
A hospital system
  •       A hospital system is designed to maintain information about
          available beds for incoming patients and to provide information
          about the number of beds to the admissions unit.
  •       It is assumed that the hospital has a number of empty beds and
          this changes over time. The variable B reflects the number of
          empty beds known to the system.
  •       Sometimes the system reports that the number of empty beds is
          the actual number available; sometimes the system reports that
          fewer than the actual number are available .
  •       In circumstances where the system reports that an incorrect
          number of beds are available, is this a failure?



Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012   Slide 7
What is failure?
                                                                  •    Technical, engineering
                                                                       view: a failure is ‘a
                                                                       deviation from a
                                                                       specification’.
                                                                  •    An oracle can examine a
                                                                       specification, observe a
                                                                       system’s behaviour and
                                                                       detect failures.
                                                                  •    Failure is an absolute -
                                                                       the system has either
                                                                       failed or it hasn’t
Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012                        Slide 8
Bed management system
   •      The percentage of system users who considered the
          system’s incorrect reporting of the number of
          available beds to be a failure was 0%.
   •      Mostly, the number did not matter so long as it was
          greater than 1. What mattered was whether or not
          patients could be admitted to the hospital.
   •      When the hospital was very busy (available beds =
          0), then people understood that it was practically
          impossible for the system to be accurate.
   •      They used other methods to find out whether or not a
          bed was available for an incoming patient.

Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012   Slide 9
Failure is a judgement
•     Specifications are a gross simplification of reality for
      complex systems.
•     Users don’t read and don’t care about specifications
•     Whether or not system behaviour should be considered
      to be a failure, depends on the observer’s judgement
•     This judgement depends on:
     –      The observer’s expectations
     –      The observer’s knowledge and experience
     –      The observer’s role
     –      The observer’s context or situation
     –      The observer’s authority
Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012   Slide 10
Failures are inevitable
•       Technical reasons
      –       When systems are composed of opaque and uncontrolled
              components, the behaviour of these components cannot be
              completely understood
      –       Failures often can be considered to be failures in data rather than
              failures in behaviour

•       Socio-technical reasons
      –       Changing contexts of use mean that the judgement on what
              constitutes a failure changes as the effectiveness of the system in
              supporting work changes
      –       Different stakeholders will interpret the same behaviour in different
              ways because of different interpretations of ‘the problem’


    Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012   Slide 11
Conflict inevitability
   •      Impossible to establish a set of requirements where
          stakeholder conflicts are all resolved
   •      Therefore, successful operation of a system for one
          set of stakeholders will inevitably mean ‘failure’ for
          another set of stakeholders
   •      Groups of stakeholders in organisations are often in
          perennial conflict (e.g. managers and clinicians in a
          hospital). The support delivered by a system
          depends on the power held at some time by a
          stakeholder group.



Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012   Slide 12
Normal failures
   •      ‘Failures’ are not just catastrophic events but
          normal, everyday system behaviour that disrupts
          normal work and that mean that people have to
          spend more time on a task than necessary
   •      A system failure occurs when a direct or indirect user
          of a system has to carry out extra work, over and
          above that normally required to carry out some
          task, in response to some inappropriate or
          unexpected system behaviour
   •      This extra work constitutes the cost of recovery from
          system failure
Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012   Slide 13
The Swiss Cheese model




Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012   Slide 14
Failure trajectories
  •       Failures rarely have a single cause. Generally, they
          arise because several events occur simultaneously
        –         Loss of data in a critical system
              •      User mistypes command and instructs data to be deleted
              •      System does not check and ask for confirmation of destructive
                     action
              •      No backup of data available

  •       A failure trajectory is a sequence of undesirable
          events that coincide in time, usually initiated by some
          human action. It represents a failure in the defensive
          layers in the system

Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012          Slide 15
Vulnerabilities and defences
 •    Vulnerabilities
     –    Faults in the (socio-technical) system which, if triggered by a
          human or technical error, can lead to system failure
     –    e.g. missing check on input validity

 •    Defences
     –    System features that avoid, tolerate or recover from human
          error
     –    Type checking that disallows allocation of incorrect types of
          value

 •       When an adverse event happens, the key question is
         not ‘whose fault was it’ but ‘why did the system
         defences fail?’
Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 16
Reason’s Swiss Cheese Model




Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012   Slide 17
Active failures
  •       Active failures
        –      Active failures are the unsafe acts committed by people who are in
               direct contact with the system or failures in the system technology.
        –      Active failures have a direct and usually short-lived effect on the
               integrity of the defenses.

  •       Latent conditions
        –      Fundamental vulnerabilities in one or more layers of the socio-
               technical system such as system faults, system and process
               misfit, alarm overload, inadequate maintenance, etc.
        –      Latent conditions may lie dormant within the system for many years
               before they combine with active failures and local triggers to create
               an accident opportunity.


Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012            Slide 18
Defensive layers
• Complex IT systems should have many defensive
  layers:
     – some are engineered - alarms, physical barriers, automatic
       shutdowns,
     – others rely on people - surgeons, anesthetists, pilots, control
       room operators,
     – and others depend on procedures and administrative
       controls.
• In an ideal word, each defensive layer would be intact.
• In reality, they are more like slices of Swiss
  cheese, having many holes- although unlike in the
  cheese, these holes are continually
  opening, shutting, and shifting their location.
 Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012   Slide 19
Dynamic vulnerabilities
  •       While some vulnerabilities are static (e.g.
          programming errors), others are dynamic and depend
          on the context where the system is used.
  •       For example
        –      vulnerabilities may be related to human actions whose
               performance is dependent on workload, state of mind, etc. An
               operator may be distracted and forget to check something
        –      vulnerabilities may depend on configuration – checks may
               depend on particular programs being up and running so if
               program A is running in a system then a check may be made
               but if program B is running, then the check is not made


Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012   Slide 20
Recovering from failure




Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012   Slide 21
Coping with failure
                                                                       •   People are good at
                                                                           coping with
                                                                           unexpected situations
                                                                           when things go
                                                                           wrong.
                                                                           –   They can take the
                                                                               initiative, adopt
                                                                               responsibilities
                                                                               and, where
                                                                               necessary, break the
                                                                               rules or step outside
                                                                               the normal process of
                                                                               doing things.
                                                                           –   People can prioritise
                                                                               and focus on the
                                                                               essence of a problem
Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012                              Slide 22
Recovery strategies
   •      Local knowledge
         –      Who to call; who knows what; where things are
   •      Process reconfiguration
         –      Doing things in a different way from that defined in the ‘standard’
                process
         –      Work-arounds, breaking the rules (safe violations)
   •      Redundancy and diversity
         –      Maintaining copies of information in different forms from that
                maintained in a software system
         –      Informal information annotation
         –      Using multiple communication channels
   •      Trust
         –      Relying on others to cope
Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012             Slide 23
Design for recovery
  •       Holistic systems engineering
        –      Software systems design has to be seen as part of a wider
               process of socio-technical systems engineering

  •       We cannot build ‘correct’ systems
        –      We must therefore design systems to allow the broader
               socio-technical systems to recognise, diagnose and recover
               from failures

  •       Extend current systems to support recovery
  •       Develop recovery support systems as an integral part
          of systems of systems

Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012   Slide 24
Recovery strategy
•      Designing for recovery is a holistic approach to system design and
       not (just) the identification of ‘recovery requirements’
•      Should support the natural ability of people and organisations to
       cope with problems
       –        Ensure that system design decisions do not increase the amount
                of recovery work required
       –        Make system design decisions that make it easier to recover
                from problems (i.e. reduce extra work required)
            •       Earlier recognition of problems
            •       Visibility to make hypotheses easier to formulate
            •       Flexibility to support recovery actions



    Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012   Slide 25
Key points
 •    Failures are inevitable in complex systems because
      multiple stakeholders see these systems in different
      ways and because there is no single manager of
      these systems
 •    Failures are a judgement – they are not absolute –
      but depend on the system observer
 •    The Swiss cheese model is a failure model based on
      active failures (trigger events) and latent errors
      (system vulnerabilities).
 •       People have developed strategies for coping with
         failure and systems should not be designed to make
         coping more difficult.
Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 26

Contenu connexe

Similaire à Socio-technical systems failure (LSCITS EngD 2012)

Requirements Engineering (CS 5032 2012)
Requirements Engineering (CS 5032 2012)Requirements Engineering (CS 5032 2012)
Requirements Engineering (CS 5032 2012)Ian Sommerville
 
Socio technical systems (LSCITS EngD)
Socio technical systems (LSCITS EngD)Socio technical systems (LSCITS EngD)
Socio technical systems (LSCITS EngD)Ian Sommerville
 
Dependability Engineering 2 (CS 5032 2012)
Dependability Engineering 2 (CS 5032 2012)Dependability Engineering 2 (CS 5032 2012)
Dependability Engineering 2 (CS 5032 2012)Ian Sommerville
 
Bug or Feature? Covert Impairments to Human Computer Interaction
Bug or Feature? Covert Impairments to Human Computer InteractionBug or Feature? Covert Impairments to Human Computer Interaction
Bug or Feature? Covert Impairments to Human Computer Interactionivaderivader
 
Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)Peter Tröger
 
Self healing-systems
Self healing-systemsSelf healing-systems
Self healing-systemsSKORDEMIR
 
CS 5032 L4 requirements engineering 2013
CS 5032 L4 requirements engineering 2013CS 5032 L4 requirements engineering 2013
CS 5032 L4 requirements engineering 2013Ian Sommerville
 
Dependable Systems -Dependability Threats (2/16)
Dependable Systems -Dependability Threats (2/16)Dependable Systems -Dependability Threats (2/16)
Dependable Systems -Dependability Threats (2/16)Peter Tröger
 
Software engineering socio-technical systems
Software engineering   socio-technical systemsSoftware engineering   socio-technical systems
Software engineering socio-technical systemsDr. Loganathan R
 
CS 5032 L3 socio-technical systems 2013
CS 5032 L3 socio-technical systems 2013CS 5032 L3 socio-technical systems 2013
CS 5032 L3 socio-technical systems 2013Ian Sommerville
 
Dependablity Engineering 1 (CS 5032 2012)
Dependablity Engineering 1 (CS 5032 2012)Dependablity Engineering 1 (CS 5032 2012)
Dependablity Engineering 1 (CS 5032 2012)Ian Sommerville
 
Artificial Intelligence: Agent Technology
Artificial Intelligence: Agent TechnologyArtificial Intelligence: Agent Technology
Artificial Intelligence: Agent TechnologyThe Integral Worm
 
Introduction to Critical Systems Engineering (CS 5032 2012)
Introduction to Critical Systems Engineering (CS 5032 2012)Introduction to Critical Systems Engineering (CS 5032 2012)
Introduction to Critical Systems Engineering (CS 5032 2012)Ian Sommerville
 
Dependability and security (CS 5032 2012)
Dependability and security (CS 5032 2012)Dependability and security (CS 5032 2012)
Dependability and security (CS 5032 2012)Ian Sommerville
 
Observability for Emerging Infra (what got you here won't get you there)
Observability for Emerging Infra (what got you here won't get you there)Observability for Emerging Infra (what got you here won't get you there)
Observability for Emerging Infra (what got you here won't get you there)Charity Majors
 
The Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsThe Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsbcantrill
 

Similaire à Socio-technical systems failure (LSCITS EngD 2012) (20)

Requirements Engineering (CS 5032 2012)
Requirements Engineering (CS 5032 2012)Requirements Engineering (CS 5032 2012)
Requirements Engineering (CS 5032 2012)
 
Socio technical systems (LSCITS EngD)
Socio technical systems (LSCITS EngD)Socio technical systems (LSCITS EngD)
Socio technical systems (LSCITS EngD)
 
Dependability Engineering 2 (CS 5032 2012)
Dependability Engineering 2 (CS 5032 2012)Dependability Engineering 2 (CS 5032 2012)
Dependability Engineering 2 (CS 5032 2012)
 
Bug or Feature? Covert Impairments to Human Computer Interaction
Bug or Feature? Covert Impairments to Human Computer InteractionBug or Feature? Covert Impairments to Human Computer Interaction
Bug or Feature? Covert Impairments to Human Computer Interaction
 
System success and failure
System success and failureSystem success and failure
System success and failure
 
Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)
 
Self healing-systems
Self healing-systemsSelf healing-systems
Self healing-systems
 
CS 5032 L4 requirements engineering 2013
CS 5032 L4 requirements engineering 2013CS 5032 L4 requirements engineering 2013
CS 5032 L4 requirements engineering 2013
 
Dependable Systems -Dependability Threats (2/16)
Dependable Systems -Dependability Threats (2/16)Dependable Systems -Dependability Threats (2/16)
Dependable Systems -Dependability Threats (2/16)
 
Emergent properties
Emergent propertiesEmergent properties
Emergent properties
 
Software engineering socio-technical systems
Software engineering   socio-technical systemsSoftware engineering   socio-technical systems
Software engineering socio-technical systems
 
CS 5032 L3 socio-technical systems 2013
CS 5032 L3 socio-technical systems 2013CS 5032 L3 socio-technical systems 2013
CS 5032 L3 socio-technical systems 2013
 
Dependablity Engineering 1 (CS 5032 2012)
Dependablity Engineering 1 (CS 5032 2012)Dependablity Engineering 1 (CS 5032 2012)
Dependablity Engineering 1 (CS 5032 2012)
 
Artificial Intelligence: Agent Technology
Artificial Intelligence: Agent TechnologyArtificial Intelligence: Agent Technology
Artificial Intelligence: Agent Technology
 
Introduction to Critical Systems Engineering (CS 5032 2012)
Introduction to Critical Systems Engineering (CS 5032 2012)Introduction to Critical Systems Engineering (CS 5032 2012)
Introduction to Critical Systems Engineering (CS 5032 2012)
 
Dependability and security (CS 5032 2012)
Dependability and security (CS 5032 2012)Dependability and security (CS 5032 2012)
Dependability and security (CS 5032 2012)
 
Production based system
Production based systemProduction based system
Production based system
 
Socio technical system
Socio technical systemSocio technical system
Socio technical system
 
Observability for Emerging Infra (what got you here won't get you there)
Observability for Emerging Infra (what got you here won't get you there)Observability for Emerging Infra (what got you here won't get you there)
Observability for Emerging Infra (what got you here won't get you there)
 
The Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsThe Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systems
 

Plus de Ian Sommerville

Ultra Large Scale Systems
Ultra Large Scale SystemsUltra Large Scale Systems
Ultra Large Scale SystemsIan Sommerville
 
Dependability requirements for LSCITS
Dependability requirements for LSCITSDependability requirements for LSCITS
Dependability requirements for LSCITSIan Sommerville
 
Conceptual systems design
Conceptual systems designConceptual systems design
Conceptual systems designIan Sommerville
 
Requirements Engineering for LSCITS
Requirements Engineering for LSCITSRequirements Engineering for LSCITS
Requirements Engineering for LSCITSIan Sommerville
 
An introduction to LSCITS
An introduction to LSCITSAn introduction to LSCITS
An introduction to LSCITSIan Sommerville
 
Internet worm-case-study
Internet worm-case-studyInternet worm-case-study
Internet worm-case-studyIan Sommerville
 
Designing software for a million users
Designing software for a million usersDesigning software for a million users
Designing software for a million usersIan Sommerville
 
Security case buffer overflow
Security case buffer overflowSecurity case buffer overflow
Security case buffer overflowIan Sommerville
 
CS5032 Case study Ariane 5 launcher failure
CS5032 Case study Ariane 5 launcher failureCS5032 Case study Ariane 5 launcher failure
CS5032 Case study Ariane 5 launcher failureIan Sommerville
 
CS5032 Case study Kegworth air disaster
CS5032 Case study Kegworth air disasterCS5032 Case study Kegworth air disaster
CS5032 Case study Kegworth air disasterIan Sommerville
 
CS5032 L19 cybersecurity 1
CS5032 L19 cybersecurity 1CS5032 L19 cybersecurity 1
CS5032 L19 cybersecurity 1Ian Sommerville
 
CS5032 L20 cybersecurity 2
CS5032 L20 cybersecurity 2CS5032 L20 cybersecurity 2
CS5032 L20 cybersecurity 2Ian Sommerville
 
L17 CS5032 critical infrastructure
L17 CS5032 critical infrastructureL17 CS5032 critical infrastructure
L17 CS5032 critical infrastructureIan Sommerville
 
CS5032 Case study Maroochy water breach
CS5032 Case study Maroochy water breachCS5032 Case study Maroochy water breach
CS5032 Case study Maroochy water breachIan Sommerville
 
CS 5032 L18 Critical infrastructure 2: SCADA systems
CS 5032 L18 Critical infrastructure 2: SCADA systemsCS 5032 L18 Critical infrastructure 2: SCADA systems
CS 5032 L18 Critical infrastructure 2: SCADA systemsIan Sommerville
 
CS5032 L9 security engineering 1 2013
CS5032 L9 security engineering 1 2013CS5032 L9 security engineering 1 2013
CS5032 L9 security engineering 1 2013Ian Sommerville
 
CS5032 L10 security engineering 2 2013
CS5032 L10 security engineering 2 2013CS5032 L10 security engineering 2 2013
CS5032 L10 security engineering 2 2013Ian Sommerville
 

Plus de Ian Sommerville (20)

Ultra Large Scale Systems
Ultra Large Scale SystemsUltra Large Scale Systems
Ultra Large Scale Systems
 
Resp modellingintro
Resp modellingintroResp modellingintro
Resp modellingintro
 
LSCITS-engineering
LSCITS-engineeringLSCITS-engineering
LSCITS-engineering
 
Requirements reality
Requirements realityRequirements reality
Requirements reality
 
Dependability requirements for LSCITS
Dependability requirements for LSCITSDependability requirements for LSCITS
Dependability requirements for LSCITS
 
Conceptual systems design
Conceptual systems designConceptual systems design
Conceptual systems design
 
Requirements Engineering for LSCITS
Requirements Engineering for LSCITSRequirements Engineering for LSCITS
Requirements Engineering for LSCITS
 
An introduction to LSCITS
An introduction to LSCITSAn introduction to LSCITS
An introduction to LSCITS
 
Internet worm-case-study
Internet worm-case-studyInternet worm-case-study
Internet worm-case-study
 
Designing software for a million users
Designing software for a million usersDesigning software for a million users
Designing software for a million users
 
Security case buffer overflow
Security case buffer overflowSecurity case buffer overflow
Security case buffer overflow
 
CS5032 Case study Ariane 5 launcher failure
CS5032 Case study Ariane 5 launcher failureCS5032 Case study Ariane 5 launcher failure
CS5032 Case study Ariane 5 launcher failure
 
CS5032 Case study Kegworth air disaster
CS5032 Case study Kegworth air disasterCS5032 Case study Kegworth air disaster
CS5032 Case study Kegworth air disaster
 
CS5032 L19 cybersecurity 1
CS5032 L19 cybersecurity 1CS5032 L19 cybersecurity 1
CS5032 L19 cybersecurity 1
 
CS5032 L20 cybersecurity 2
CS5032 L20 cybersecurity 2CS5032 L20 cybersecurity 2
CS5032 L20 cybersecurity 2
 
L17 CS5032 critical infrastructure
L17 CS5032 critical infrastructureL17 CS5032 critical infrastructure
L17 CS5032 critical infrastructure
 
CS5032 Case study Maroochy water breach
CS5032 Case study Maroochy water breachCS5032 Case study Maroochy water breach
CS5032 Case study Maroochy water breach
 
CS 5032 L18 Critical infrastructure 2: SCADA systems
CS 5032 L18 Critical infrastructure 2: SCADA systemsCS 5032 L18 Critical infrastructure 2: SCADA systems
CS 5032 L18 Critical infrastructure 2: SCADA systems
 
CS5032 L9 security engineering 1 2013
CS5032 L9 security engineering 1 2013CS5032 L9 security engineering 1 2013
CS5032 L9 security engineering 1 2013
 
CS5032 L10 security engineering 2 2013
CS5032 L10 security engineering 2 2013CS5032 L10 security engineering 2 2013
CS5032 L10 security engineering 2 2013
 

Dernier

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Dernier (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Socio-technical systems failure (LSCITS EngD 2012)

  • 1. Systems failure – a socio- technical perspective Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 1
  • 2. Complex software systems • Multi-purpose. Organisational systems that support different functions within an organisation • System of systems. Usually distributed and normally constructed by integrating existing systems/components/services • Unlimited. Not subject to limitations derived from the laws of physics (so, no natural constraints on their size) • Data intensive. System data orders of magnitude larger than code; long-lifetime data • Dynamic. Changing quickly in response to changes in the business environment Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 2
  • 3. Systems of systems • Operational independence • Managerial independence • Multiple stakeholder viewpoints • Evolutionary development • Emergent behaviour • Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 3 Geographic
  • 4. Complex system realities • There is no definitive specification of what the system should ‘do’ and it is practically impossible to create such a specification • The complexity of the system is such that it is not ‘understandable’ as a whole • It is likely that, at all times, some parts of the system will not be fully operational • Actors responsible for different parts of the system are likely to have conflicting goals Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 4
  • 5. System failure Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 5
  • 6. System dependability model System fault System error A system An erroneous system characteristic that state that can (but need can (but need not) not) lead to a system lead to a system failure error System failure Externally- observed, unexpected and undesirable system behaviour Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 6
  • 7. A hospital system • A hospital system is designed to maintain information about available beds for incoming patients and to provide information about the number of beds to the admissions unit. • It is assumed that the hospital has a number of empty beds and this changes over time. The variable B reflects the number of empty beds known to the system. • Sometimes the system reports that the number of empty beds is the actual number available; sometimes the system reports that fewer than the actual number are available . • In circumstances where the system reports that an incorrect number of beds are available, is this a failure? Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 7
  • 8. What is failure? • Technical, engineering view: a failure is ‘a deviation from a specification’. • An oracle can examine a specification, observe a system’s behaviour and detect failures. • Failure is an absolute - the system has either failed or it hasn’t Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 8
  • 9. Bed management system • The percentage of system users who considered the system’s incorrect reporting of the number of available beds to be a failure was 0%. • Mostly, the number did not matter so long as it was greater than 1. What mattered was whether or not patients could be admitted to the hospital. • When the hospital was very busy (available beds = 0), then people understood that it was practically impossible for the system to be accurate. • They used other methods to find out whether or not a bed was available for an incoming patient. Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 9
  • 10. Failure is a judgement • Specifications are a gross simplification of reality for complex systems. • Users don’t read and don’t care about specifications • Whether or not system behaviour should be considered to be a failure, depends on the observer’s judgement • This judgement depends on: – The observer’s expectations – The observer’s knowledge and experience – The observer’s role – The observer’s context or situation – The observer’s authority Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 10
  • 11. Failures are inevitable • Technical reasons – When systems are composed of opaque and uncontrolled components, the behaviour of these components cannot be completely understood – Failures often can be considered to be failures in data rather than failures in behaviour • Socio-technical reasons – Changing contexts of use mean that the judgement on what constitutes a failure changes as the effectiveness of the system in supporting work changes – Different stakeholders will interpret the same behaviour in different ways because of different interpretations of ‘the problem’ Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 11
  • 12. Conflict inevitability • Impossible to establish a set of requirements where stakeholder conflicts are all resolved • Therefore, successful operation of a system for one set of stakeholders will inevitably mean ‘failure’ for another set of stakeholders • Groups of stakeholders in organisations are often in perennial conflict (e.g. managers and clinicians in a hospital). The support delivered by a system depends on the power held at some time by a stakeholder group. Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 12
  • 13. Normal failures • ‘Failures’ are not just catastrophic events but normal, everyday system behaviour that disrupts normal work and that mean that people have to spend more time on a task than necessary • A system failure occurs when a direct or indirect user of a system has to carry out extra work, over and above that normally required to carry out some task, in response to some inappropriate or unexpected system behaviour • This extra work constitutes the cost of recovery from system failure Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 13
  • 14. The Swiss Cheese model Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 14
  • 15. Failure trajectories • Failures rarely have a single cause. Generally, they arise because several events occur simultaneously – Loss of data in a critical system • User mistypes command and instructs data to be deleted • System does not check and ask for confirmation of destructive action • No backup of data available • A failure trajectory is a sequence of undesirable events that coincide in time, usually initiated by some human action. It represents a failure in the defensive layers in the system Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 15
  • 16. Vulnerabilities and defences • Vulnerabilities – Faults in the (socio-technical) system which, if triggered by a human or technical error, can lead to system failure – e.g. missing check on input validity • Defences – System features that avoid, tolerate or recover from human error – Type checking that disallows allocation of incorrect types of value • When an adverse event happens, the key question is not ‘whose fault was it’ but ‘why did the system defences fail?’ Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 16
  • 17. Reason’s Swiss Cheese Model Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 17
  • 18. Active failures • Active failures – Active failures are the unsafe acts committed by people who are in direct contact with the system or failures in the system technology. – Active failures have a direct and usually short-lived effect on the integrity of the defenses. • Latent conditions – Fundamental vulnerabilities in one or more layers of the socio- technical system such as system faults, system and process misfit, alarm overload, inadequate maintenance, etc. – Latent conditions may lie dormant within the system for many years before they combine with active failures and local triggers to create an accident opportunity. Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 18
  • 19. Defensive layers • Complex IT systems should have many defensive layers: – some are engineered - alarms, physical barriers, automatic shutdowns, – others rely on people - surgeons, anesthetists, pilots, control room operators, – and others depend on procedures and administrative controls. • In an ideal word, each defensive layer would be intact. • In reality, they are more like slices of Swiss cheese, having many holes- although unlike in the cheese, these holes are continually opening, shutting, and shifting their location. Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 19
  • 20. Dynamic vulnerabilities • While some vulnerabilities are static (e.g. programming errors), others are dynamic and depend on the context where the system is used. • For example – vulnerabilities may be related to human actions whose performance is dependent on workload, state of mind, etc. An operator may be distracted and forget to check something – vulnerabilities may depend on configuration – checks may depend on particular programs being up and running so if program A is running in a system then a check may be made but if program B is running, then the check is not made Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 20
  • 21. Recovering from failure Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 21
  • 22. Coping with failure • People are good at coping with unexpected situations when things go wrong. – They can take the initiative, adopt responsibilities and, where necessary, break the rules or step outside the normal process of doing things. – People can prioritise and focus on the essence of a problem Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 22
  • 23. Recovery strategies • Local knowledge – Who to call; who knows what; where things are • Process reconfiguration – Doing things in a different way from that defined in the ‘standard’ process – Work-arounds, breaking the rules (safe violations) • Redundancy and diversity – Maintaining copies of information in different forms from that maintained in a software system – Informal information annotation – Using multiple communication channels • Trust – Relying on others to cope Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 23
  • 24. Design for recovery • Holistic systems engineering – Software systems design has to be seen as part of a wider process of socio-technical systems engineering • We cannot build ‘correct’ systems – We must therefore design systems to allow the broader socio-technical systems to recognise, diagnose and recover from failures • Extend current systems to support recovery • Develop recovery support systems as an integral part of systems of systems Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 24
  • 25. Recovery strategy • Designing for recovery is a holistic approach to system design and not (just) the identification of ‘recovery requirements’ • Should support the natural ability of people and organisations to cope with problems – Ensure that system design decisions do not increase the amount of recovery work required – Make system design decisions that make it easier to recover from problems (i.e. reduce extra work required) • Earlier recognition of problems • Visibility to make hypotheses easier to formulate • Flexibility to support recovery actions Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 25
  • 26. Key points • Failures are inevitable in complex systems because multiple stakeholders see these systems in different ways and because there is no single manager of these systems • Failures are a judgement – they are not absolute – but depend on the system observer • The Swiss cheese model is a failure model based on active failures (trigger events) and latent errors (system vulnerabilities). • People have developed strategies for coping with failure and systems should not be designed to make coping more difficult. Human Failure, LSCITS, EngD course in Socio-technical Systems,, 2012 Slide 26