1. Organisational Failure
Prof Ian Sommerville
Video link
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 1
2. Organisational failure
• Why and how organisational factors can contribute to
system failures
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 2
3. Why organisations matter?
• Organisations have multiple, inter-related, potentially
conflicting goals:
– Efficient resource utilisation
– Timely delivery of products/services
– Customer satisfaction
– Owner satisfaction
– Regulatory compliance
– Safety and dependability
– Maintenance of reputation/brand
– Future development
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 3
4. Decision making
• Organisational decision making involves taking all of
these into account
– Inevitably, this sometimes means making compromises that
affect the safety and dependability of a system
• These compromises lead to vulnerabilities and
hazards that may then compromise the safety or
dependability of the system
• In complex organisations, there are competing
priorities in different parts of the organisation
– Shifting power and authority in an organisation affects
decision making
– May be deliberate lack of communications across the
organisation
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 4
5. NASA Challenger disaster
• Space shuttle exploded shortly after take-off
• The cause was the failure of rubber seals (O-rings)
that allowed hot gas to escape and make contact with
fuel tanks which then exploded
• Subsequent enquiry showed that O-ring failure was
due to brittleness at low temperatures
• Arguably, decision makers were complacent because
– Redundant (primary and secondary) O-rings in the system
– Damage to primary O-rings had been tolerated in previous
launches
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 5
6. Organisational failure
• Engineers were concerned about launching in low
temperatures and advised against launch
• But goals other than safety and dependability took
precedence and engineers were overruled
– „Owner‟ satisfaction
• already several delays to flight
– Future planning
• NASA wanted a success to support budget negotiations
– Resource utilisation
• Reluctance to address known problem with O-rings because of
costs
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 6
7. Normal accidents
• Developed by Charles Perrow who conducted a study of a
nuclear accident in the USA (Three Mile Island)
• Official conclusion was that the problems were due to
“human error”
• Perrow disagreed with this and argues that failures are
„normal‟ and inevitable in complex systems which have:
– Interactive complexity
• The presence of unfamiliar, unplanned and unexpected sequences
of events in a system that are not visible or immediately
comprehensible
– Tight coupling
• The presence of interdependent components.
• Tight coupling will make a system more prone to cascading errors.
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 7
9. Redundancy
• The use of redundancy is a fundamental technique in
achieving system safety
– Primary and secondary O-rings on space shuttle
– Quintuple redundancy in Airbus FCS
• Failure of primary system can be tolerated
• Perrow argues that redundancy can decrease rather
than increase safety:
– Increases complexity and coupling in the system
– Provides reassurance that system faults can be tolerated
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 9
10. Failures or successes
• Normal accident theory is based on extensive studies
of system failures
• It argues that failure is systemic and an inherent
characteristic of the system itself
• Alternative perspective is based on studies of
success
– Why are there some areas that are apparently complex (e.g.
air traffic management) where failures are relatively
uncommon?
• Led to the notion of high-reliability organisations
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 10
11. Failure-free organisations?
• High-reliability organisation (HRO) researchers
disagree that complex, highly interdependent
systems will inevitably have accidents
– They believe organisations are able to compensate for
technical shortcomings through their methods of operation, in
essence they argue that organisations can be ‘failure free’.
• Based on studies of „reliable‟ organisations
– Aircraft carriers
– Air traffic control
– Nuclear power stations
– Intensive care units
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 11
12. Aircraft carrier flight operations
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 12
13. Nuclear powered carriers
• Complex systems
– Carriers are 24 stories high and carry enough fuel for 15
years. 2000 telephones. 3,360 compartments and spaces
– Multiple software intensive systems (command systems,
aircraft software)
– Dangerous objects (aircraft, fuel, and explosives) in close
proximity.
– Aircraft taking off and landing in 48-60 second intervals.
– 6000 crew. Several different kinds of aircraft, multiple
squadrons.
– All work interdependently and must be coordinated.
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 13
14. Nuclear powered carriers
• High risk
– Nuclear reactor accidents
– Fire, flooding, grounding, collision
– Fuel and weapons explosions
– Mistaken identification of friends and foes
– High risks both to crew and a much larger public
• High reliability
– Low “crunch rates”
– comparatively few major accidents
• High reliability achieved through organisational
design
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 14
15. High Reliability Organisations
• High Reliability Organisations (HROs) have particular
qualities
– Reliability takes precedence over efficiency
– Preoccupation with failure, not success
– Share the big picture
– Focus on details
– Migrate decisions
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 15
16. Reliability over Efficiency
– Reliability comes before efficiency but cannot replace it
– Decisions are made on the grounds of reliability first and then
efficiency
– Efficiency initiatives are treated with scepticism
– Managers regularly talk to and familiarise themselves with
staff about how they do their work and why. This stops
managers focusing just on figures.
– Organisations develop safety measures as well as financial
measures, and include these in employee evaluations
– Organisations assign value to the avoidance of accidents
– High redundancy despite cost
– Cautious actions when necessary despite cost
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 16
17. Preoccupation with Failure
• HROs recognise that:
– Workers need to be heedful to the possibility of failure
– Failures are normal but accidents should be avoided
– Acknowledge there can be unexpected failure modes, even
in common activities
• HROs address failure by:
– Constant training of all people (simulations, apprenticing,
practice)
– Using incident reporting
– Designing in extensive redundancy
– Maintaining contingencies for critical operations
– Requiring proofs that something is safe, not that it is unsafe
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 17
18. Carrier operations
– There is constant tracking of issues around malfunctioning,
defective and substandard equipment.
• They act on these by training crew how to overcome problems
and pressuring vendors to make improvements
– Extensive redundancy (overlapping jobs, multiple channels
and centres of communications, spare parts, multiple sources
for decision making).
• Example: if an aircrafts landing gear warning light comes on, the
spotter, commander and pilot all work together to establish what
the issues is.
– Multiple contingencies are maintained
• Example: There will always be multiple options for how to land
the plane (or for the pilot to escape).
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 18
19. Sharing the Big Picture
• HROs recognise that:
– If people are narrowly focused they will act only in their own
interest
– People need to maintain awareness of other people and
events around the organisation
• HROs
– Train people broadly
– Educate people about overarching objectives, and set
statements of purpose
– Give people access to information on what is happening
elsewhere
– Clearly specify how people and teams fit into the whole
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 19
20. Reluctance to Simplify
• HROS are reluctant to simplify
• All organisations have to simplify and abstract, to filter out
unnecessary information (particularly for getting “big pictures”)
• Rather, HROs
– Use labels and categories as little as possible as they stop
you from looking further into details and events.
– Continually rework labels and categories
– Listen to wisdom, but with skepticism
– Do not focus on information that supports expectations, but
focus on that which doesn‟t fit or disconfirms desires
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 20
21. Migration of decision making
• HROs migrate decision making as far down the
organisation as possible
– Decisions are not made by one central authority
• HROs recognise:
– Decisions need to be made where there is expertise
– Decisions often need to be made quickly
– People must be trained in making decisions and are given
the right resources to do so
– Skill levels and legitimacy through the organisation and
people are trusted
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 21
22. HROs and Normal Accidents
• HRO theory is sometimes presented as conflicting
with Normal Accidents
– HRO proponents may argue that accidents are not „normal‟
– Leveson critiques work on HROs and argues that they are
not based on concerns of tightly coupled systems
• Arguably, an HRO is an organisation that has taken
active steps to:
– reduce coupling and
– reduce interactions
– Once that has been achieved, the driver for HRO‟s is
perhaps a strong „safety culture‟ to promote safety across the
organisation
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 22
23. Organisational vulnerabilities
• Organisational vulnerabilities are characteristics of an
organisation that weaken defensive layers and so
may lead to system failure.
• Examples of organisational vulnerabilities
– Over-reliance on process to achieve safety/dependability
– Responsibility failures
– Weak safety/dependability culture
– Under-resourcing of safety
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 23
24. Over-reliance on process
• Quality standards such as ISO 9000 place great
emphasis on process and process assurance
– Implication of these standards is that process is paramount
• This tends to promote a belief that focusing on
process is the way to achieve safety and
dependability
• However, processes are never isolated and have to
be enacted in a dynamic context
• Sometimes necessary to deviate from the „normal‟
process to achieve safety and dependability
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 24
25. Responsibility failures
• System failures are often a consequence of
responsibility failures
– Unassigned responsibility
– Misassigned responsibility
– Misunderstood responsibility
– Duplicated responsibilities
– Responsibility overload
– Responsibility fragility
• Responsibility failures may be a consequence of poor
communications and/or under-resourcing
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 25
26. Organisational culture
• “The way that we do things around here”
• Culture may conflict with public statements of
priorities
– “The patient comes first”
– “Safety is our goal”
• Investment banking
– High risk, high reward
– Lack of regulation or weak compliance with regulations
– Large-scale failures
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 26
27. Safety culture
• Some organisations have developed a strong safety
culture where safety is seen as a priority by all
members of the organisation
• Safety culture (UK HSE)
– “The product of individual and group values, attitudes,
perceptions, competencies, and patterns of behaviour that
determine the commitment to, and the style and proficiency
of, an organization‟s health and safety management”
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 27
30. Under-resourcing
• If operations are under-resourced then safety and
dependability are often sacrificed
• Organisational priorities focus on optimising resource
utilisation to continue service delivery
– Safety and dependability may be seen as an avoidable
overhead
• Example
– Cleaning services in hospital outsourced to save money
– Competitive tender
– Under-resourced so quality of service reduced
• Consequent increase in hospital acquired infections
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 30
31. Complex systems
• Complexity = Coupling + Interaction
• Lesson for LSCITS
– Increasing complexity will lead to unpredictable system failure
– Strive to build LSITS rather than LSCITS
• Improve safety by
– Reducing coupling
– Reducing interactions
– Redundancy may not improve safety as it increases complexity in
the system
• Address problems at organisational as well as the system level
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 31
32. Key points
• Organisational decisions, influenced by structure and
culture, often have a major impact on safety and
dependability
• Normal Accident Theory postulates that accidents are
inevitable in complex, tightly coupled systems
• High-reliability organisations aim to achieve safety
through a set of practices that aim to reduce failures
• Organisational vulnerabilities include over-reliance on
process, responsibility failures, poor safety culture
and under-resourcing
Organisational Failure, York EngD Course in LSCITS, 2012 Slide 32