SlideShare une entreprise Scribd logo
1  sur  56
Learning from Incidents
at Auto Trader
@4ndyHumphrey
Learning from Failure
at Auto Trader
@4ndyHumphrey
 What is a Learning Organisation?
 What is the Reality?
 What are my Choices?
 Incident Reviews - things to Avoid
 Incident Reviews - things to Encourage
 What about holding people to Account?
 A bit on Our process
Learning from Incidents
Our People
PRIVATE Car Sellers
Trade Car Dealers
30,000
15,000
Auto Trader Staff
Product & Tech Teams
850
275
Our Customers
Our Technology Platform
1.2 billion page views per month
70 million peak page views per day
15 million unique visitors per month
Supported by 100 live applications
Further Reading up front
Links:
John Allspaw - The Infinite Hows
Steve Shorrock - if it werent for the people
EuroControl - Systems Thinking for Safety
Lyndsay Holmwood - Blame-Language-Sharing
Sydney Dekker - Just Culture
Black Box Thinking –
Matthew Syed
People:
Steven Shorrock
Erik Hollnagel
Sidney Dekker
Matthew Syed
John Allspaw
Lindsay Holmwood
Dave Zwieback
Nancy Leveson
Field Guide to
Understanding
Human Error –
Sidney Dekker
Beyond Blame –
Dave Zwieback Nancy Leveson -
Engineering a Safer
World
Further Reading up front
What is a Learning Organisation?
The Loom
A Learning Organisation
Moral Responsibility
Job Satisfaction
Economic Imperative
Why should I want to learn?
What’s the reality?
Blame management
Blame - Fundamental Attribution Error
Blame - Justice
Blame - Hindsight
Blame – Bad Apple Theory
Blame – Ignoring context
Jonathan Caramanus/Green Renaissance/wwf.org.uk
Blame - It’s Easy
What are my choices?
Things will always go wrong
https://www.youtube.com/watch?v=EvegBo4TUdQ
You can blame people…
Or say it’s a one off…
Or you can look at the context…
…Learn and make changes
“Blame is the enemy of safety…”
But it is a choice:
Nancy Leveson
W. Edwards Deming
“Whenever there is fear, you
will get wrong figures.”
Incident Reviews:
Things to avoid
Culture of fear
Top down
Asking Why?
Environment
Capabilities
Behavior
Values and Beliefs
Identity
Contexts – WHERE?
Methods, Approaches – HOW?
Skills and Actions – WHAT?
Motivation and permission - WHY?
Sense of Self, Role– WHO?
Questioning styles:
Dilts Model
Don’t go too Deep!
Environment
Capabilities
Behavior
Values and Beliefs
Identity
Contexts – WHERE?
Methods, Approaches – HOW?
Skills and Actions – WHAT?
What is important/true – WHY?
Sense of Self – WHO?
Dilts Model
Single Root Cause
Points scoring
Incident Reviews:
How to encourage learning
Priming
Keep an open mind
Explore how events unfolded
Incident Review Prompts
(from The Field Guide To Understanding Human Error, by Sidney Dekker)
At each juncture in the sequence of events (if that is how you want to structure this part of the accident story), you want to get to
know:
• Which cues were observed (what did he or she notice/see or did not notice what he or she had expected to notice?)
• What knowledge was used to deal with the situation? Did participants have any experience with similar situations that was useful in dealing with this
one?
• What expectations did participants have about how things were going to develop, and what options did they think they have to influence the course
of events?
• How did other influences (operational or organizational) help determine how they interpreted the situation and how they would act?
Here are some questions Gary Klein and his researchers typically ask to find out how the situation looked to people on the inside at each of the critical
junctures:
Debriefings need not follow such a scripted set of questions, of course, as the relevance of questions depends on the event. Also, the questions can come across
to
participants as too conceptual to make any sense. You may need to reformulate them in the language of the domain.
Cues What were you seeing?
What were you focusing on?
What were you expecting to happen?
Interpretation If you had to describe the situation to your colleague at that point, what would you have told?
Errors What mistakes (for example in interpretation) were likely at this point?
Previous
experience/knowledge
Were you reminded of any previous experience?
Did this situation fit a standard scenario?
Were you trained to deal with this situation?
Were there any rules that applied clearly
here?
Did any other sources of knowledge suggest what to do?Goals What were you trying to achieve?
Were there multiple goals at the same time?
Was there time pressure or other limitations on what you could do?
Taking action How did you judge you could influence the course of events?
Did you discuss or mentally imagine a number of options or did you know straight away what to do?
Outcome Did the outcome fit your expectation?
Did you have to update your assessment of the situation?
Communications What communication medium(s) did you prefer to use? (phone, chat, email, video conf,
etc.?) Did you make use of more than one communication channels at once?
Help Did you ask anyone for help?
What signal brought you to ask for support or assistance?
Were you able to contact the people you needed to
contact?
Timelines
14:00 Alert
received from
Site confidence
15:15 Incident
communication
sent
16:00 Incident
closure comms
sent
1. Factual timeline entries
can be filled in prior to the
Review Meeting
Timelines
14:00 Alert
received from
Site confidence
15:15 Incident
communication
sent
16:00 Incident
closure comms
sent
1. Factual timeline entries
can be filled in prior to the
Review Meeting
13:10 Slow server
performance
observed by BIll
14:20 Bill spoke to John
about SC issues and
decided to recover DB
15:50 John finished DB
recovery
2. As a group,
overlay the basic
timeline with key
decisions and
junctures
One conversation
Actions
Impartial facilitator
Investigate what went well
Practice – make it habit
What about holding people to
account?
Accountability
Our process:
Major Incidents
High Severity Incidents
Failed Releases (all)
Failed Changes (Large)
Our Process
Priming – Timeline - Actions
We understand and truly believe that everyone did
the best job they could, given what they knew at the
time, their skills and abilities, the resources
available, and the situation at hand
We are here to learn and find solutions to improve
our ways of working
Why we are here:
Open Minded
Go back in time
No single ‘Root Cause’
How not Why
Things that help us learn
• Blaming people
• Human Error
• Arse Covering
• Points scoring
• ‘Trying Harder’
• Talking over people
Things that stop us learning:
After the review:
• Incident details recorded
• Actions (owners, dates) recorded
• Owned by Service Management Team
Further Reading up front
Links:
John Allspaw - The Infinite Hows
Steve Shorrock - if it werent for the people
EuroControl - Systems Thinking for Safety
Lyndsay Holmwood - Blame-Language-Sharing
Sydney Dekker - Just Culture
Black Box Thinking –
Matthew Syed
People:
Steven Shorrock
Erik Hollnagel
Sidney Dekker
Matthew Syed
John Allspaw
Lindsay Holmwood
Dave Zwieback
Nancy Leveson
Field Guide to
Understanding
Human Error –
Sidney Dekker
Beyond Blame –
Dave Zwieback Nancy Leveson -
Engineering a Safer
World
Further Reading Again
Questions?

Contenu connexe

Similaire à Learning From Incidents at Autotrader

User Experience Doesn’t Happen on a Screen - It Happens in the Mind. Introduc...
User Experience Doesn’t Happen on a Screen - It Happens in the Mind. Introduc...User Experience Doesn’t Happen on a Screen - It Happens in the Mind. Introduc...
User Experience Doesn’t Happen on a Screen - It Happens in the Mind. Introduc...
UXPA International
 
Measuring & Maintaining Employee Engagement
Measuring & Maintaining Employee EngagementMeasuring & Maintaining Employee Engagement
Measuring & Maintaining Employee Engagement
People Lab
 

Similaire à Learning From Incidents at Autotrader (20)

Provoking change in a gold mining company in South Africa
Provoking change in a gold mining company in South AfricaProvoking change in a gold mining company in South Africa
Provoking change in a gold mining company in South Africa
 
Musst masterclass instantly increased influence hand out
Musst masterclass instantly increased influence hand outMusst masterclass instantly increased influence hand out
Musst masterclass instantly increased influence hand out
 
Collaborative Research The Conference by Media Evolution Malmö
Collaborative Research The Conference by Media Evolution MalmöCollaborative Research The Conference by Media Evolution Malmö
Collaborative Research The Conference by Media Evolution Malmö
 
The calm before the storm: Action steps in SoMe crisis management
The calm before the storm: Action steps in SoMe crisis managementThe calm before the storm: Action steps in SoMe crisis management
The calm before the storm: Action steps in SoMe crisis management
 
"From Insights to Action" by Andrew Vincent, a Revelation Great Research Thin...
"From Insights to Action" by Andrew Vincent, a Revelation Great Research Thin..."From Insights to Action" by Andrew Vincent, a Revelation Great Research Thin...
"From Insights to Action" by Andrew Vincent, a Revelation Great Research Thin...
 
Crisis Communications Webinar - June 10
Crisis Communications Webinar - June 10Crisis Communications Webinar - June 10
Crisis Communications Webinar - June 10
 
Team swivel box
Team swivel boxTeam swivel box
Team swivel box
 
Organizational Diagnosis
Organizational DiagnosisOrganizational Diagnosis
Organizational Diagnosis
 
User Experience Doesn’t Happen on a Screen - It Happens in the Mind. Introduc...
User Experience Doesn’t Happen on a Screen - It Happens in the Mind. Introduc...User Experience Doesn’t Happen on a Screen - It Happens in the Mind. Introduc...
User Experience Doesn’t Happen on a Screen - It Happens in the Mind. Introduc...
 
Accountability and Ownership.pdf
Accountability and Ownership.pdfAccountability and Ownership.pdf
Accountability and Ownership.pdf
 
20100811 jwv dommel valley group workshop
20100811 jwv dommel valley group workshop20100811 jwv dommel valley group workshop
20100811 jwv dommel valley group workshop
 
Introduction to Evaluation.pptx
Introduction to Evaluation.pptxIntroduction to Evaluation.pptx
Introduction to Evaluation.pptx
 
How to Not Destroy the World - the Ethics of Web Design
How to Not Destroy the World - the Ethics of Web DesignHow to Not Destroy the World - the Ethics of Web Design
How to Not Destroy the World - the Ethics of Web Design
 
Marketing vs. IT - Let the Battle Begin
Marketing vs. IT - Let the Battle BeginMarketing vs. IT - Let the Battle Begin
Marketing vs. IT - Let the Battle Begin
 
Toolkit for Human Centered Design by Radboudumc REshape
Toolkit for Human Centered Design by Radboudumc REshapeToolkit for Human Centered Design by Radboudumc REshape
Toolkit for Human Centered Design by Radboudumc REshape
 
Ces 2013 towards a cdn definition of evaluation
Ces 2013   towards a cdn definition of evaluationCes 2013   towards a cdn definition of evaluation
Ces 2013 towards a cdn definition of evaluation
 
Pob stage 1 seminar 3 sdb
Pob stage 1   seminar 3 sdbPob stage 1   seminar 3 sdb
Pob stage 1 seminar 3 sdb
 
Story Telling EDA 2023
Story Telling EDA 2023Story Telling EDA 2023
Story Telling EDA 2023
 
Exercise and summary of Critical Thinking.pptx
Exercise and summary of Critical Thinking.pptxExercise and summary of Critical Thinking.pptx
Exercise and summary of Critical Thinking.pptx
 
Measuring & Maintaining Employee Engagement
Measuring & Maintaining Employee EngagementMeasuring & Maintaining Employee Engagement
Measuring & Maintaining Employee Engagement
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Learning From Incidents at Autotrader

  • 1. Learning from Incidents at Auto Trader @4ndyHumphrey
  • 2. Learning from Failure at Auto Trader @4ndyHumphrey
  • 3.  What is a Learning Organisation?  What is the Reality?  What are my Choices?  Incident Reviews - things to Avoid  Incident Reviews - things to Encourage  What about holding people to Account?  A bit on Our process Learning from Incidents
  • 4. Our People PRIVATE Car Sellers Trade Car Dealers 30,000 15,000 Auto Trader Staff Product & Tech Teams 850 275 Our Customers
  • 5. Our Technology Platform 1.2 billion page views per month 70 million peak page views per day 15 million unique visitors per month Supported by 100 live applications
  • 6. Further Reading up front Links: John Allspaw - The Infinite Hows Steve Shorrock - if it werent for the people EuroControl - Systems Thinking for Safety Lyndsay Holmwood - Blame-Language-Sharing Sydney Dekker - Just Culture Black Box Thinking – Matthew Syed People: Steven Shorrock Erik Hollnagel Sidney Dekker Matthew Syed John Allspaw Lindsay Holmwood Dave Zwieback Nancy Leveson Field Guide to Understanding Human Error – Sidney Dekker Beyond Blame – Dave Zwieback Nancy Leveson - Engineering a Safer World Further Reading up front
  • 7. What is a Learning Organisation?
  • 10. Moral Responsibility Job Satisfaction Economic Imperative Why should I want to learn?
  • 13. Blame - Fundamental Attribution Error
  • 16. Blame – Bad Apple Theory
  • 17. Blame – Ignoring context Jonathan Caramanus/Green Renaissance/wwf.org.uk
  • 19. What are my choices?
  • 20. Things will always go wrong https://www.youtube.com/watch?v=EvegBo4TUdQ
  • 21. You can blame people…
  • 22. Or say it’s a one off…
  • 23. Or you can look at the context…
  • 24. …Learn and make changes
  • 25. “Blame is the enemy of safety…” But it is a choice: Nancy Leveson W. Edwards Deming “Whenever there is fear, you will get wrong figures.”
  • 30. Environment Capabilities Behavior Values and Beliefs Identity Contexts – WHERE? Methods, Approaches – HOW? Skills and Actions – WHAT? Motivation and permission - WHY? Sense of Self, Role– WHO? Questioning styles: Dilts Model
  • 31. Don’t go too Deep! Environment Capabilities Behavior Values and Beliefs Identity Contexts – WHERE? Methods, Approaches – HOW? Skills and Actions – WHAT? What is important/true – WHY? Sense of Self – WHO? Dilts Model
  • 34. Incident Reviews: How to encourage learning
  • 36. Keep an open mind
  • 37. Explore how events unfolded
  • 38. Incident Review Prompts (from The Field Guide To Understanding Human Error, by Sidney Dekker) At each juncture in the sequence of events (if that is how you want to structure this part of the accident story), you want to get to know: • Which cues were observed (what did he or she notice/see or did not notice what he or she had expected to notice?) • What knowledge was used to deal with the situation? Did participants have any experience with similar situations that was useful in dealing with this one? • What expectations did participants have about how things were going to develop, and what options did they think they have to influence the course of events? • How did other influences (operational or organizational) help determine how they interpreted the situation and how they would act? Here are some questions Gary Klein and his researchers typically ask to find out how the situation looked to people on the inside at each of the critical junctures: Debriefings need not follow such a scripted set of questions, of course, as the relevance of questions depends on the event. Also, the questions can come across to participants as too conceptual to make any sense. You may need to reformulate them in the language of the domain. Cues What were you seeing? What were you focusing on? What were you expecting to happen? Interpretation If you had to describe the situation to your colleague at that point, what would you have told? Errors What mistakes (for example in interpretation) were likely at this point? Previous experience/knowledge Were you reminded of any previous experience? Did this situation fit a standard scenario? Were you trained to deal with this situation? Were there any rules that applied clearly here? Did any other sources of knowledge suggest what to do?Goals What were you trying to achieve? Were there multiple goals at the same time? Was there time pressure or other limitations on what you could do? Taking action How did you judge you could influence the course of events? Did you discuss or mentally imagine a number of options or did you know straight away what to do? Outcome Did the outcome fit your expectation? Did you have to update your assessment of the situation? Communications What communication medium(s) did you prefer to use? (phone, chat, email, video conf, etc.?) Did you make use of more than one communication channels at once? Help Did you ask anyone for help? What signal brought you to ask for support or assistance? Were you able to contact the people you needed to contact?
  • 39. Timelines 14:00 Alert received from Site confidence 15:15 Incident communication sent 16:00 Incident closure comms sent 1. Factual timeline entries can be filled in prior to the Review Meeting
  • 40. Timelines 14:00 Alert received from Site confidence 15:15 Incident communication sent 16:00 Incident closure comms sent 1. Factual timeline entries can be filled in prior to the Review Meeting 13:10 Slow server performance observed by BIll 14:20 Bill spoke to John about SC issues and decided to recover DB 15:50 John finished DB recovery 2. As a group, overlay the basic timeline with key decisions and junctures
  • 45. Practice – make it habit
  • 46. What about holding people to account?
  • 49. Major Incidents High Severity Incidents Failed Releases (all) Failed Changes (Large) Our Process
  • 50. Priming – Timeline - Actions
  • 51. We understand and truly believe that everyone did the best job they could, given what they knew at the time, their skills and abilities, the resources available, and the situation at hand We are here to learn and find solutions to improve our ways of working Why we are here:
  • 52. Open Minded Go back in time No single ‘Root Cause’ How not Why Things that help us learn
  • 53. • Blaming people • Human Error • Arse Covering • Points scoring • ‘Trying Harder’ • Talking over people Things that stop us learning:
  • 54. After the review: • Incident details recorded • Actions (owners, dates) recorded • Owned by Service Management Team
  • 55. Further Reading up front Links: John Allspaw - The Infinite Hows Steve Shorrock - if it werent for the people EuroControl - Systems Thinking for Safety Lyndsay Holmwood - Blame-Language-Sharing Sydney Dekker - Just Culture Black Box Thinking – Matthew Syed People: Steven Shorrock Erik Hollnagel Sidney Dekker Matthew Syed John Allspaw Lindsay Holmwood Dave Zwieback Nancy Leveson Field Guide to Understanding Human Error – Sidney Dekker Beyond Blame – Dave Zwieback Nancy Leveson - Engineering a Safer World Further Reading Again

Notes de l'éditeur

  1. Private Sellers: us selling our Cars Trade Car Dealers 15,000 - Independent dealers, Franchise dealers, Car Supermarkets
  2. Availability at 99.99% supporting products for Consumers, Private Sellers and Trade Retailers. Supporting access across multiple platforms Supporting Commercial and International Autotrader sites e.g. Dealer Websites Automotive leader for dealer websites with just under 5000 dealers’ sites hosted
  3. Peter Senge – 1990 – the Fifth Discipline Learning and transformation are central functions of the organisation – always changing , never steady state. A Learning Organisation is a term given to a company that facilitates the learning of its members and continuously transforms itself. A learning organisation is a place where people are continually discovering how they create their reality. The loss of the stable state means that our society and all of its institutions are in continuous processes of transformation. We cannot expect new stable states that will endure for our own lifetimes. We must learn to understand, guide, influence and manage these transformations. We must make the capacity for undertaking them integral to ourselves and to our institutions. We must, in other words, become adept at learning. We must become able not only to transform our institutions, in response to changing situations and requirements; we must invent and develop institutions which are ‘learning systems’, that is to say, systems capable of bringing about their own continuing transformation. (Schon 1973: 28) http://infed.org/mobi/the-learning-organization/
  4. A story from Toyota’s origins when it used to build automatic looms. Upon hearing that the plans for one of the looms had been stolen, Kiichiro Toyoda is said to have remarked: Certainly the thieves may be able to follow the design plans and produce a loom. But we are modifying and improving our looms every day. So by the time the thieves have produced a loom from the plans they stole, we will have already advanced well beyond that point. And because they do not have the expertise gained from the failures it took to produce the original, they will waste a great deal more time than us as they move to improve their loom. We need not be concerned about what happened. We need only continue as always, making our improvements. The long-term value of an enterprise is not captured by the value of its products and intellectual property but rather by its ability to continuously increase the value it provides to customers—and to create new customers—through Learning and innovation. (Lean Enterprise p18)
  5. And we all do this within our organisations right??? ITIL – continuous improvement Deming Cycle – PDCA OODA DMAIC – Six sigma lean process improvement
  6. Our attitudes, culture and behavior prevent learning
  7. WHY DO WE DO IT??? Fundamental Attribution Error: How do we explain the behavior of others It turns out there is we are biased towards. Explain the behavior of others due to their personality Explain our own behavior as a result of context. We need to overcome this bias to learn from Incidents and other kinds of failure Image http://www.ffxiah.com/forum/topic/26676/fundamental-attribution-error
  8. WHY DO WE DO IT??? We assume that All accidents or incidents require a human mistake The severity of the accident is proportional to the size of the mistake Punishment acts as a deterrent to prevent issues happening in the future. Need for Retributive justice Punishment Deterrent We often diminish the need for restorative justice. Preventing the issue happening again
  9. WHY DO WE DO IT??? Hindsight BIAS - knew-it-all-along effect is the inclination, after an event has occurred, to see the event as having been predictable, despite there having been little or no objective basis for predicting it. Narrative written after the fact Does not make sense England football manager Fabio Capello – From Black Box thinking – Matthew Syed. Came into English football in 2008 – 2012 He introduced a strict regime of diet, rules around lateness, bans for family members from training and tournements He was pretty successful and lots of commentators put this down to
  10. Retributive vs. Restorative Justice This table illustrates the differences in the approach to justice between Retributive Justice and Restorative Justice. As you will see, Restorative Justice is much more community centric and focuses on making the victim whole. Retributive Justice Restorative Justice Crime is an act against the state, a violation of a law, an abstract idea Crime is an act against another person and the community The criminal justice system controls crime Crime control lies primarily in the community Offender accountability defined as taking punishment Accountability defined as assuming responsibility and taking action to repair harm Crime is an individual act with individual responsibility Crime has both individual and social dimensions of responsibility Punishment is effective: Threats of punishment deter crime Punishment changes behavior Punishment alone is not effective in changing behavior and is disruptive to community harmony and good relationships Victims are peripheral to the process Victims are central to the process of resolving a crime. The offender is defined by deficits The offender is defined by capacity to make reparation Focus on establishing blame or guilt, on the past (did he/she do it?) Focus on the problem solving, on liabilities/obligations, on the future (what should be done?) Emphasis on adversarial relationship Emphasis on dialogue and negotiation Imposition of pain to punish and deter/prevent Restitution as a means of restoring both parties; goal of reconciliation/restoration Community on sideline, represented abstractly by state Community as facilitator in restorative process Response focused on offender’s past behavior Response focused on harmful consequences of offender’s behavior; emphasis is on the future Dependence upon proxy professionals Direct involvement by participants
  11. WHY DO WE DO IT??? Bad Apple Theory: Complacency We assume that systems and procedures are safe and reliable It’s only a few ‘bad apples’ http://radar.oreilly.com/2014/11/if-it-werent-for-the-people.html Steve Shorrock Our view is often that the system is basically safe, so long as the human works as imagined. When things go wrong, we have a seemingly innate human tendency to blame the person at the sharp end. We don’t seem to think of that someone – pilot, controller, train driver or surgeon – as a human being who goes to work to ensure things go right in a messy, complex, demanding and uncertain environment.
  12. Work as imagined vs Work as done We don’t understand. Trade offs Completing pressures Conflicting incentives Procedures adapted for real world
  13. Blame is easy It removes accountability from the organisation We don’t need to consider organizational changes, system changes, (difficult things) It removes the need for self criticism It’s cheap It’s quick
  14. Miss Universe 2015 Steve Harvey – veteran TV presenter in America Announced the winner as Columbia and not Miss Phillipines
  15. It’s just one of those things that happen Human Error https://www.linkedin.com/pulse/how-bad-design-wrecked-steve-harveys-universe-eric-Thomas
  16. It’s just one of those things that happen Human Error
  17. Lights, Sounds What was on the card? What was on the teleprompter?
  18. https://www.linkedin.com/pulse/how-bad-design-wrecked-steve-harveys-universe-eric-Thomas
  19. Blame impact: Fewer issues reported Culture of fear Less responsibility taken – safety is someone else’s responsibility to implement. The wrong data – incorrect accounts Dishonesty – distortion Denying error, diminishing the impact
  20. Our attitudes, culture and behavior prevent learning
  21. Often Learnt behavior from Leaders Will prevent learning John Allspaw – Blameless Postmortem – Web Operations We need to find ways to allow practitioners to tell their stories Without fear that there will be retribution In a supportive atmosphere where failure is not stigmatized Where we regularly talk about (celebrate) our mistakes and take ownership of improving things This cycle of name/blame/shame can be looked at like this: Engineer takes action and contributes to a failure or incident. Engineer is punished, shamed, blamed, or retrained. Reduced trust between engineers on the ground (the “sharp end”) and management (the “blunt end”) looking for someone to scapegoat Engineers become silent on details about actions/situations/observations, resulting in “Cover-Your-Ass” engineering (from fear of punishment) Management becomes less aware and informed on how work is being performed day to day, and engineers become less educated on lurking or latent conditions for failure due to silence mentioned in #4, above Errors more likely, latent conditions can’t be identified due to #5, above Repeat from step 1
  22. Need a wide range of review attendees taking actions Trust between engineers taking action (sharp end) and managers (blunt end) Especially important to share these actions across teams, departments, disciplines Things to avoid – managers taking no action, or all the actions!
  23. Refer John Allspaw – Inifinte Hows
  24. DILTS Model – logical levels - -levels of learning and change Useful as a coaching aim How the language you use can affect the impact and depth to which you get a response. Asking Who and Why really probe deep through these logical levels John Allspaw – infinite hows 1. A new release disabled a feature for some customers. WHY? Because a particular server failed 2. Environment – Contexts Behavior – Skills and Actions Capabilities – Methods, Approaches, Strategies Values and Beliefs – What is important and true Identity – You sense of self
  25. Why Asks people to justify their actions Leads to Who
  26. No single root causes with any incident involving complex systems (all our incidents)
  27. Cherry picking of data to prove pre-existing ideas about what happened. WHAT YOU FIND IS WHAT YOU LOOK FOR Points scoring: It’s easy to use examples of when things go wrong to prove a point or win battles with others. This is generally cherry picking of information and unhelpful to us as an organisation. If unchallenged it will lead to more defensiveness, hiding/manipulation of data etc.
  28. Our attitudes, culture and behavior prevent learning
  29. Good psychological effect States what’s expected Frames the conversation Example from Matthew Syed again – priming experiment and walking the corridor Good example – Agile Prime Directive "Regardless of what we discover, we understand and truly believe that everyone did the best job they could, given what they knew at the time, their skills and abilities, the resources available, and the situation at hand."
  30. Open Minded Everyone is expected to come to an incident review keen to learn new information and listen to the experiences and stories of their colleagues. It’s not acceptable to bring your pre-formulated, rigid ideas of what happened/causes/solutions. Explore differences of opinion Listen to peoples stories of how events unfolded
  31. Focus on Going back in Time Understand the nature of the events AS THEY UNFOLDED over time Are we talking about ordinary routine work Special event? Something never seen before? Consider the predictability of the even at the time. what was known at the time.
  32. Go back in time What information was available to you at this point? What cues, what alerts, what information was available What other pressures did you have Time pressure, multiple focusses
  33. Actual vs Ideal Timeline should probably be created by the people attending the incident review, but we’ve amended so that a ‘factual’ bare bones of the time line is pre-populated by the Incident owner prior to the meeting to save time. Some facts can be added to the timeline before the meeting to save time – Duty Manager can collate this data from Chat, Logs, Emails etc. When adding information as a group about decisions made , explore the differences between peoples perception of what happened Be careful not to get trapped into ‘single root cause’ and listen to as many contributing factors as possible.
  34. Actual vs Ideal Some facts can be added to the timeline before the meeting to save time – Duty Manager can collate this data from Chat, Logs, Emails etc. When adding information as a group about decisions made , explore the differences between peoples perception of what happened Be careful not to get trapped into ‘single root cause’ and listen to as many contributing factors as possible.
  35. Ensure everyone gets a chance to speak and be listened to by everyone Need to keep the whole room to one conversation.
  36. Actions shared, visible, completed Don’t always have to have an action! It might be that understanding how colleagues dealt with the incident and learning more about normal working of your organisation is enough.
  37. Are you the right person to run the incident review ?? Are you seen as impartial?? I’ve done this !!! Give example of not doing this right. Are you the right person to lead this Review? Are you independent Are you independent enough to be fair and impartial? And be seen by others as such?
  38. Celebrate the things that went well (timeline shows how well response unfolded, can report on how well people worked together) Not just pat on back Analyse the things that worked well – good decisions that prevented more downtime, how people adapted what they knew to a new situation How can the good patterns be replicated or enhanced even further? If you truly understand what went well and HOW it went well – you can re-produce this is more situations.
  39. Retrospectives Reviews If you only review the most serious of incidents – you will not get the atmosphere right, people will not be used to it People will be defensive
  40. So this is all great, but what about when people need to be held responsible for their actions? Good PDF http://www.saa.com.sg/saaWeb2011/export/sites/saa/en/Publication/downloads/JustCulture_ReportingtheLine_Accountability.pdf Negligence (turning up to work drunk) Malicious damage (intentionally trying to harm people or the organisation) Incompetence (Making stupid mistakes, not following clear procedures) Gross Misconduct e.g. Defined in a nursing malpractice situation, negligence means the following: The doing of something which a reasonably prudent person would not do, or the failure to do something which a reasonably prudent person would do, under circumstances similar to those shown by the evidence. http://ccn.aacnjournals.org/content/23/5/72.full http://www.saa.com.sg/saaWeb2011/export/sites/saa/en/Publication/downloads/JustCulture_ReportingtheLine_Accountability.pdf Accountability is often interpreted as blaming practitioners for mistakes. This creates a conflict between learning and accountability. This paper proposes three simultaneous directions to achieve a Just Culture: not using incident reports as evidence for disciplinary action, deciding and getting broad support for who gets to decide what is acceptable and unacceptable behaviour switching from blame and backward-looking accountability to forward-looking accountability.
  41. We have an poor view of accountability BLAME does not equal accountability Blame has a massive cost Blame limits accountability We need to encourage forward-accountability not using incident reports as evidence for disciplinary action, b) deciding and getting broad support for who gets to decide what is acceptable and unacceptable behaviour and c) switching from blame and backward-looking accountability to forward-looking accountability Cost: The fear of blame, sanction and punishment, however, is known to change the behaviour of practitioners. They might be induced to hide, downplay or redefine incidents, rather than reporting and sharing them (Merry & Smith, 2001), creating a culture of ‘risk secrecy.’ The possibility of disciplinary action (or worse, prosecution) creates a conflict between accountability and learning. Blame is known as the enemy of safety (Leveson, 2011). Forward-looking accountability needs an environment which encourages sharing accounts and takes away the idea of blame
  42. All Major Incidents and High Severity Incidents have a review All failed large changes have a review (including things that should have been large) All failed Releases have a review We use a similar format for Team / Project retrospectives - certainly in atmosphere
  43. All Major Incidents and High Severity Incidnets have a review All failed large changes have a review (including things that should have been large) All failed Releases have a review We use a similar format for Team / Project retrospectives - certainly in atmosphere
  44. Timeline written on wall Paper prompts at the top are the ‘priming’ bit and read out at the start of the meeting They are also emailed with the Incident Review invite. We use a similar format for Team / Project retrospectives - certainly in atmosphere
  45. State what we are here for: State our approach: A note from Martin Fowler on PRIMING http://martinfowler.com/bliki/PrimingPrimeDirective.html
  46. Open Minded Everyone is expected to come to an incident review keen to learn new information and listen to the experiences and stories of their colleagues. It’s not acceptable to bring your pre-formulated, rigid ideas of what happened/causes/solutions. Go back in time It’s critical we avoid using HINDSIGHT – we need to understand what information was available at the time when decisions were made and actions were performed. Best way to do this is put yourself back in time – into the context of how things unfolded. No single ‘Root Cause’ In complex systems (of people interacting with technology) there is never a single root cause. We often have many contributing factors to how events unfold – lots of those contributing factors are present all the time even when things go right. Stopping at one root cause will miss all this information. How not Why Questions that start WHY (or even worse WHO) tend to force people to justify actions, to attribute and apportion blame. WHY focuses the inquiry on people which is not what we want. We want to gather information about how events unfolded – asking HOW thinks appeared, changed, worked, WHAT happened next, WHAT was expected. Questions around HOW, WHAT, WHEN are much more effective for this.
  47. Blaming people: It’s a popular belief that we have basically safe systems and if you just sorted out the behaviour of a few ‘bad apples’ things would be OK. That’s not the way to improve safety and is generally a cop-out. Please see Agile Prime Directive for what we do believe. Human Error: As above, we are all on the same side trying to do the best job we can. Everyone has variable performance and we need systems/processes etc that are better able to accommodate and expect that. Arse Covering: Hiding information that could help us improve as an organisation would a terrible symptom of something that is wrong with our company culture. We need to make every effort to remove fear of judgement/consquences etc. from Incident Reviews. We need everyone to be open and honest. Points scoring: It’s easy to use examples of when things go wrong to prove a point or win battles with others. This is generally cherry picking of information and unhelpful to us as an organisation. If unchallenged it will lead to more denfensiveness, hiding/manipulation of data etc. ‘Trying Harder’: We will never take an action to ‘be more careful’ or ‘try harder’ not to break things. We all try pretty damn hard already and that is never the solution we are looking for. Talking over people: We can only have one conversation at a time if we are to get a shared understanding of what’s happened and what we can do about it.