SlideShare une entreprise Scribd logo
1  sur  44
(Blameless) 
post-mortems 
@GoVictorOps #VOwebinar 
@jasonhand
Jason Hand 
DevOps 
“Handyman” 
jason@VictorOps.com 
@jasonhand 
@GoVictorOps #VOwebinar 
@jasonhand
Tara Calihman 
Social Media 
Marketing Director 
@GoVictorOps #VOwebinar 
@Tarable
A little about me… 
Dir. of Platform Support - AppDirect 
Dir. of Technical Support - Standing 
Cloud 
Dir. of Operational Systems - AFI Supply 
Hiker, climber, brewer, runner, biker, boarder, surfer, 
painter, singer, reader, writer, picker, coder, racer, 
camper, volunteer …. all the usual “Colorado 1-upper” 
@GoVictorOps #VOwebinar 
@jasonhand
Alternative names 
Also known as: (Note: Public & Internal) 
Project Retrospectives 
Post-mortem analysis Post-project review 
Quality Improvement Review 
Project Analysis Review 
After Action Review 
Autopsy Review 
Santayana Review 
Touchdown Meeting 
@GoVictorOps #VOwebinar 
@jasonhand 
Learning Review
Post-mortem 
Defined 
What ? 
A process intended to inform improvements by 
determining aspects that were successful or 
unsuccessful. 
@GoVictorOps #VOwebinar 
@jasonhand
Post-mortem 
Defined 
When ? 
As soon as feasible after the Incident is resolved. 
@GoVictorOps #VOwebinar 
@jasonhand
Post-mortem 
Poll 
Who should be involved in a blameless post-mortem? 
A. Management 
B. The Dev & Ops teams 
C. Only those that played a part in the outage & 
resolution 
D. All of the above 
@GoVictorOps #VOwebinar 
@jasonhand
Post-mortem 
Answer 
D. All of the 
above 
i.e. Everybody 
Who ? 
@GoVictorOps #VOwebinar 
@jasonhand
Post-mortem 
Defined 
To communicate with your team 
Why ? 
To understand what happened for learning and 
improving 
@GoVictorOps #VOwebinar 
@jasonhand
Post-mortem 
Defined 
Talk about the incident timeline 
Escalation steps 
What was done to resolve the 
problem 
Create a remediation plan 
Make it available 
How ? 
@GoVictorOps #VOwebinar 
@jasonhand
The Three R’s 
Regret 
Acknowledgement and apology 
Reason 
Initial incident detection to resolution, 
including the so-called “root causes.” 
Remedy 
Actionable remediation items 
Dave Zwieback 
VP Engineering - Next Big Sound 
@GoVictorOps #VOwebinar 
@jasonhand 
( simple format )
Moving from Reaction to Action 
(Remedy) 
Use SMART recommendations 
Specific 
Measurable 
Agreed Upon/Agreeable 
Realistic 
Timebound 
@GoVictorOps #VOwebinar 
@jasonhand
Blameless 
@GoVictorOps #VOwebinar 
image from “Across the Universe” 
#VOwebinar
Cool story, bro 
2011 - Hired to Standing Cloud 
Cloud marketplace & automated deployment of 
apps 
Build Support team 
Provide Managed services 
@GoVictorOps #VOwebinar 
@jasonhand
– Sydney Dekker 
“Reprimanding bad apples may 
seem like a quick and rewarding 
fix, but it’s like peeing in your 
pants. 
You feel relieved and perhaps even 
nice and warm for a little while, 
but then it gets cold and 
uncomfortable. 
And you look like a fool” 
@GoVictorOps #VOwebinar 
@jasonhand
What is a blameless 
post-mortem? 
Team members are accountable but not responsible 
Complete Transparency 
Deeper look at circumstances 
What happened and how to improve it (specific 
details) 
Real conditions of failure in complex systems 
Avoid counterfactuals 
@GoVictorOps #VOwebinar 
@jasonhand
“Your organization must continually affirm 
that individuals are NEVER the “root 
cause” of outages.” 
– Dave Zwieback 
@GoVictorOps #VOwebinar 
@jasonhand
Why 
vs 
How 
@GoVictorOps #VOwebinar
Paraphrased from “Fallible Humans” by Ian Malpass 
- DevOpsDays - Minneapolis 
source: http://www.indecorous.com/fallible_humans/ 
@GoVictorOps #VOwebinar 
@jasonhand
ETTO 
(Efficiency Thoroughness Trade Off) 
The trade off between: 
being efficient 
vs 
being thorough 
Efficient 
Thorough 
@GoVictorOps #VOwebinar 
@jasonhand
“We can be thorough and really dig into the 
task at hand and understand it well but this 
takes time: 
It is inefficient.” 
- Ian Malpass 
@GoVictorOps #VOwebinar 
@jasonhand
Cause & Effect 
source: http://xkcd.com 
There are many factors that played a part in the 
problem 
“may be” 
@GoVictorOps #VOwebinar 
@jasonhand
How many times does the letter “F” 
appear in the following sentence? 
Finished files are the re-sult 
@GoVictorOps #VOwebinar 
@jasonhand 
of years of scientific 
study combined with the 
experience of many years.
How many times do you see the letter F 
@GoVictorOps #VOwebinar 
@jasonhand 
A. 3 
B. 4 
C. 5 
D. 6
@GoVictorOps #VOwebinar 
@jasonhand 
Answer: 
6 
Cognitive Bias
Stress 
& Cognitive Bias 
@GoVictorOps #VOwebinar 
@jasonhand
@GoVictorOps #VOwebinar 
@jasonhand 
Is stress good or bad? 
A.Good 
B.Bad
Yerkes-Dodson Model 
@GoVictorOps #VOwebinar 
source: The Human Side of Postmortems 
@jasonhand
@GoVictorOps #VOwebinar 
@jasonhand 
Is stress good or bad? 
Answer: 
Both
Reduce Stress? 
… build 
muscle memory 
Simulate many types of 
problems and outages as 
“practice” … 
@GoVictorOps #VOwebinar 
@jasonhand 
#VOwebinar
Evaluative Threat 
Being negatively 
judged plays a big role 
in stress 
@GoVictorOps #VOwebinar 
@jasonhand 
#VOwebinar
What is stress surface? 
Variables of a situation 
Novel or unusual 
Unpredictable 
Controllable situation 
Negative judgement 
Relationships 
Health 
Problems at home 
Lack of sleep 
@GoVictorOps #VOwebinar 
@jasonhand 
Evaluative threats 
ALSO 
Etc…
Capturing the 
Human-side 
Ask questions 
@GoVictorOps #VOwebinar 
@jasonhand
Stress Questionnaire 
0 = Never 1 = Almost Never 2 = Sometimes 
3 = Fairly Often 4 = Very Often 
During the outage, how often have you felt or thought 
that: 
The situation was novel or unusual? 
The situation was unpredictable? 
You were unable to control the situation? 
Others could judge your actions negatively? 
@GoVictorOps #VOwebinar 
@jasonhand
Why we DON’T punish 
De-incentivized to give the details 
Practically guarantees a repeat of the problem 
Understand why actions made sense (at the time) 
Create safety AND accountability 
Move away from idea of “individuals are problems” 
Create new “experts” 
@GoVictorOps #VOwebinar 
@jasonhand
@GoVictorOps #VOwebinar 
@jasonhand 
#VOwebinar
Promoting from within 
Where do we start? 
The basics: 
• Document your timeline or log data 
• Document conversations 
• Leave room for notes 
• Mean time to resolution / Time calculations 
• Level of severity 
• Archive it for historical retrieval 
• Remediation. Make it actionable 
@GoVictorOps #VOwebinar 
@jasonhand
Tools 
Etsy’s Morgue 
VictorOps 
Post-mortem Report 
@GoVictorOps #VOwebinar 
@jasonhand 
Internal Wiki
VictorOps 
Post-Mortem 
Report 
Etsy’s 
Morgue 
Tool 
@GoVictorOps #VOwebinar
Seek the truth 
Don’t blame others … 
Don’t blame yourself 
Thank You 
@GoVictorOps #VOwebinar 
@jasonhand
Questions ? 
@GoVictorOps #VOwebinar 
@jasonhand
Next Webinar: 
ChatOps Unplugged 
http://victorops.com/chatops-webinar 
Try VictorOps! 
@GoVictorOps #VOwebinar 
@jasonhand 
Now What? 
See for yourself how we're solving 
the problem of post-mortem 
reporting, on-call scheduling, alert 
management and remote 
collaboration. Start your free trial 
or join our weekly product demo 
at www.victorops.com 
Post-mortem Guides
Resources 
“The Human Side of Postmortems” - Dave Zwieback 
“The Field Guide to Understanding Human Error” - Sydney Dekker 
“A Look at Looking in the Mirror” - J. Paul Reed 
“Fallible Humans” - Ian Malpass (http://www.indecorous.com/fallible_humans/) 
“4 Questions to ask for an effective Technical Post Mortem” - Jeffrey O’Brien 
(http://www.maintenanceassistant.com/blog/4-questions-effective-technical-post-mortem/) 
“Nine steps to IT post-mortem excellence” - Michael Krigsman (http://www.zdnet.com/blog/projectfailures/nine-steps- 
to-it-post-mortem-excellence/1069) 
“Postmortem reviews: purpose and approaches in software engineering” - Torgeir Dingsøyr 
(http://www.uio.no/studier/emner/matnat/ifi/INF5180/v10/undervisningsmateriale/reading-materials/p08/post-mortems. 
pdf) 
“Blameless PostMortems and a Just Culture” - John Allspaw (http://codeascraft.com/2012/05/22/blameless-postmortems/) 
“What blameless really means” - Jessica Harllee (http://www.jessicaharllee.com/notes/what-blameless-really-means/) 
“Each necessary, but only jointly sufficient” - John Allspaw (http://www.kitchensoap.com/2012/02/10/each-necessary- 
@GoVictorOps but-only-jointly-sufficient/) 
#VOwebinar 
@jasonhand

Contenu connexe

Tendances

The Convergence of Wills
The Convergence of WillsThe Convergence of Wills
The Convergence of WillsBeyond20
 
10 Years In The Hole: A Possibly Cautionary Tale About Being A Higher Ed Web ...
10 Years In The Hole: A Possibly Cautionary Tale About Being A Higher Ed Web ...10 Years In The Hole: A Possibly Cautionary Tale About Being A Higher Ed Web ...
10 Years In The Hole: A Possibly Cautionary Tale About Being A Higher Ed Web ...Dylan Wilbanks
 
5 Principles of the Modern Math Classroom - Edscape 2015
5 Principles of the Modern Math Classroom - Edscape 20155 Principles of the Modern Math Classroom - Edscape 2015
5 Principles of the Modern Math Classroom - Edscape 2015Gerald Aungst
 
How to be a Successful Salesforce Admin!
How to be a Successful Salesforce Admin!How to be a Successful Salesforce Admin!
How to be a Successful Salesforce Admin!Mike Gerholdt
 
Leanconf 2014: the agony of lean startup by tristan kromer
Leanconf 2014: the agony of lean startup by tristan kromerLeanconf 2014: the agony of lean startup by tristan kromer
Leanconf 2014: the agony of lean startup by tristan kromerLeanconf
 
How to Build Innovative Technologies
How to Build Innovative TechnologiesHow to Build Innovative Technologies
How to Build Innovative TechnologiesAbby Fichtner
 
Quick Check Usability Testing - UPA 2012
Quick Check Usability Testing - UPA 2012Quick Check Usability Testing - UPA 2012
Quick Check Usability Testing - UPA 2012Kate Walser
 

Tendances (10)

The Convergence of Wills
The Convergence of WillsThe Convergence of Wills
The Convergence of Wills
 
TIAD 2016 : Kaizen Ops by Jessica DeVita
TIAD 2016 : Kaizen Ops by Jessica DeVitaTIAD 2016 : Kaizen Ops by Jessica DeVita
TIAD 2016 : Kaizen Ops by Jessica DeVita
 
10 Years In The Hole: A Possibly Cautionary Tale About Being A Higher Ed Web ...
10 Years In The Hole: A Possibly Cautionary Tale About Being A Higher Ed Web ...10 Years In The Hole: A Possibly Cautionary Tale About Being A Higher Ed Web ...
10 Years In The Hole: A Possibly Cautionary Tale About Being A Higher Ed Web ...
 
Transformational Impact of the Cloud
Transformational Impact of the CloudTransformational Impact of the Cloud
Transformational Impact of the Cloud
 
Selling Your Ideas VMA
Selling Your Ideas VMASelling Your Ideas VMA
Selling Your Ideas VMA
 
5 Principles of the Modern Math Classroom - Edscape 2015
5 Principles of the Modern Math Classroom - Edscape 20155 Principles of the Modern Math Classroom - Edscape 2015
5 Principles of the Modern Math Classroom - Edscape 2015
 
How to be a Successful Salesforce Admin!
How to be a Successful Salesforce Admin!How to be a Successful Salesforce Admin!
How to be a Successful Salesforce Admin!
 
Leanconf 2014: the agony of lean startup by tristan kromer
Leanconf 2014: the agony of lean startup by tristan kromerLeanconf 2014: the agony of lean startup by tristan kromer
Leanconf 2014: the agony of lean startup by tristan kromer
 
How to Build Innovative Technologies
How to Build Innovative TechnologiesHow to Build Innovative Technologies
How to Build Innovative Technologies
 
Quick Check Usability Testing - UPA 2012
Quick Check Usability Testing - UPA 2012Quick Check Usability Testing - UPA 2012
Quick Check Usability Testing - UPA 2012
 

Similaire à WEBINAR: VictorOps Blameless Post-Mortems

It's Not Your Fault - Blameless Post-mortems
It's Not Your Fault - Blameless Post-mortemsIt's Not Your Fault - Blameless Post-mortems
It's Not Your Fault - Blameless Post-mortemsJason Hand
 
Data Driven DevOps
Data Driven DevOpsData Driven DevOps
Data Driven DevOpsLeon Stigter
 
DevOps Connect: Josh Corman and Gene Kim discuss DevOpsSec
DevOps Connect: Josh Corman and Gene Kim discuss DevOpsSecDevOps Connect: Josh Corman and Gene Kim discuss DevOpsSec
DevOps Connect: Josh Corman and Gene Kim discuss DevOpsSecSonatype
 
Deep Customer Research...The Heart Of Innovation - Richard Young and Diana Ad...
Deep Customer Research...The Heart Of Innovation - Richard Young and Diana Ad...Deep Customer Research...The Heart Of Innovation - Richard Young and Diana Ad...
Deep Customer Research...The Heart Of Innovation - Richard Young and Diana Ad...Thoughtworks
 
Content, Data and Humans
Content, Data and HumansContent, Data and Humans
Content, Data and HumansRandall Snare
 
Viral videos creation and distribution
Viral videos creation and distributionViral videos creation and distribution
Viral videos creation and distributionMassimiliano La Franca
 
The Unrealized Role of Monitoring & Alerting w/ Jason Hand
The Unrealized Role of Monitoring & Alerting w/ Jason HandThe Unrealized Role of Monitoring & Alerting w/ Jason Hand
The Unrealized Role of Monitoring & Alerting w/ Jason HandSonatype
 
Now! How to Delight Your Audience and Hug Your Haters With Realtime Online Se...
Now! How to Delight Your Audience and Hug Your Haters With Realtime Online Se...Now! How to Delight Your Audience and Hug Your Haters With Realtime Online Se...
Now! How to Delight Your Audience and Hug Your Haters With Realtime Online Se...Zignal Labs
 
Telling your visual story
Telling your visual storyTelling your visual story
Telling your visual storyVolunteerMatch
 
Social Media in Real Life: How YorkU won gold by bringing social to live events
Social Media in Real Life: How YorkU won gold by bringing social to live eventsSocial Media in Real Life: How YorkU won gold by bringing social to live events
Social Media in Real Life: How YorkU won gold by bringing social to live eventsMark Farmer
 
Tools, Culture, and Aesthetics: The Art of DevOps
Tools, Culture, and Aesthetics: The Art of DevOpsTools, Culture, and Aesthetics: The Art of DevOps
Tools, Culture, and Aesthetics: The Art of DevOpsJ. Paul Reed
 
Making sense of community engagement, impacts and outcomes
Making sense of community engagement, impacts and outcomesMaking sense of community engagement, impacts and outcomes
Making sense of community engagement, impacts and outcomesMetroWater
 
Introduction to Lean Analytics for Lean Startup Circle SF
Introduction to Lean Analytics for Lean Startup Circle SFIntroduction to Lean Analytics for Lean Startup Circle SF
Introduction to Lean Analytics for Lean Startup Circle SFLean Analytics
 
The Usability of Usability
The Usability of UsabilityThe Usability of Usability
The Usability of UsabilityAndrew Chak
 
Lean Analytics @ MicroConf
Lean Analytics @ MicroConfLean Analytics @ MicroConf
Lean Analytics @ MicroConfLean Analytics
 
Designing apps lecture
Designing apps lectureDesigning apps lecture
Designing apps lectureJohn Rooksby
 
Museum Website Best Practices for the 21st Century
Museum Website Best Practices for the 21st CenturyMuseum Website Best Practices for the 21st Century
Museum Website Best Practices for the 21st CenturyDana Mitroff Silvers
 
Content for 2015 and Beyond by Matt Beswick, SEM Days 2015
Content for 2015 and Beyond by Matt Beswick, SEM Days 2015Content for 2015 and Beyond by Matt Beswick, SEM Days 2015
Content for 2015 and Beyond by Matt Beswick, SEM Days 2015SEO monitor
 

Similaire à WEBINAR: VictorOps Blameless Post-Mortems (20)

It's Not Your Fault - Blameless Post-mortems
It's Not Your Fault - Blameless Post-mortemsIt's Not Your Fault - Blameless Post-mortems
It's Not Your Fault - Blameless Post-mortems
 
Data Driven DevOps
Data Driven DevOpsData Driven DevOps
Data Driven DevOps
 
DevOps Connect: Josh Corman and Gene Kim discuss DevOpsSec
DevOps Connect: Josh Corman and Gene Kim discuss DevOpsSecDevOps Connect: Josh Corman and Gene Kim discuss DevOpsSec
DevOps Connect: Josh Corman and Gene Kim discuss DevOpsSec
 
Deep Customer Research...The Heart Of Innovation - Richard Young and Diana Ad...
Deep Customer Research...The Heart Of Innovation - Richard Young and Diana Ad...Deep Customer Research...The Heart Of Innovation - Richard Young and Diana Ad...
Deep Customer Research...The Heart Of Innovation - Richard Young and Diana Ad...
 
Content, Data and Humans
Content, Data and HumansContent, Data and Humans
Content, Data and Humans
 
Viral videos creation and distribution
Viral videos creation and distributionViral videos creation and distribution
Viral videos creation and distribution
 
The Unrealized Role of Monitoring & Alerting w/ Jason Hand
The Unrealized Role of Monitoring & Alerting w/ Jason HandThe Unrealized Role of Monitoring & Alerting w/ Jason Hand
The Unrealized Role of Monitoring & Alerting w/ Jason Hand
 
Now! How to Delight Your Audience and Hug Your Haters With Realtime Online Se...
Now! How to Delight Your Audience and Hug Your Haters With Realtime Online Se...Now! How to Delight Your Audience and Hug Your Haters With Realtime Online Se...
Now! How to Delight Your Audience and Hug Your Haters With Realtime Online Se...
 
Telling your visual story
Telling your visual storyTelling your visual story
Telling your visual story
 
Social Media in Real Life: How YorkU won gold by bringing social to live events
Social Media in Real Life: How YorkU won gold by bringing social to live eventsSocial Media in Real Life: How YorkU won gold by bringing social to live events
Social Media in Real Life: How YorkU won gold by bringing social to live events
 
The Jason and Scot Show: The Shop.org Edition
The Jason and Scot Show: The Shop.org EditionThe Jason and Scot Show: The Shop.org Edition
The Jason and Scot Show: The Shop.org Edition
 
Tools, Culture, and Aesthetics: The Art of DevOps
Tools, Culture, and Aesthetics: The Art of DevOpsTools, Culture, and Aesthetics: The Art of DevOps
Tools, Culture, and Aesthetics: The Art of DevOps
 
Making sense of community engagement, impacts and outcomes
Making sense of community engagement, impacts and outcomesMaking sense of community engagement, impacts and outcomes
Making sense of community engagement, impacts and outcomes
 
Introduction to Lean Analytics for Lean Startup Circle SF
Introduction to Lean Analytics for Lean Startup Circle SFIntroduction to Lean Analytics for Lean Startup Circle SF
Introduction to Lean Analytics for Lean Startup Circle SF
 
The Usability of Usability
The Usability of UsabilityThe Usability of Usability
The Usability of Usability
 
Lean Analytics @ MicroConf
Lean Analytics @ MicroConfLean Analytics @ MicroConf
Lean Analytics @ MicroConf
 
Designing apps lecture
Designing apps lectureDesigning apps lecture
Designing apps lecture
 
Museum Website Best Practices for the 21st Century
Museum Website Best Practices for the 21st CenturyMuseum Website Best Practices for the 21st Century
Museum Website Best Practices for the 21st Century
 
Content for 2015 and Beyond by Matt Beswick, SEM Days 2015
Content for 2015 and Beyond by Matt Beswick, SEM Days 2015Content for 2015 and Beyond by Matt Beswick, SEM Days 2015
Content for 2015 and Beyond by Matt Beswick, SEM Days 2015
 
Viral is a Dirty Word
Viral is a Dirty WordViral is a Dirty Word
Viral is a Dirty Word
 

Plus de VictorOps

DevOps Roadtrip Final Speaking Deck
DevOps Roadtrip Final Speaking Deck DevOps Roadtrip Final Speaking Deck
DevOps Roadtrip Final Speaking Deck VictorOps
 
DevOps: A Practical Guide
DevOps: A Practical GuideDevOps: A Practical Guide
DevOps: A Practical GuideVictorOps
 
Crisis Communication Webinar
Crisis Communication WebinarCrisis Communication Webinar
Crisis Communication WebinarVictorOps
 
The Importance of Minimum Viable Runbooks Webinar
The Importance of Minimum Viable Runbooks WebinarThe Importance of Minimum Viable Runbooks Webinar
The Importance of Minimum Viable Runbooks WebinarVictorOps
 
DevOps Roadtrip - Denver
DevOps Roadtrip - DenverDevOps Roadtrip - Denver
DevOps Roadtrip - DenverVictorOps
 
VictorOps & Raygun: A Stunning Integration
VictorOps & Raygun: A Stunning IntegrationVictorOps & Raygun: A Stunning Integration
VictorOps & Raygun: A Stunning IntegrationVictorOps
 
ChatOps: The New Interface of DevOps
ChatOps: The New Interface of DevOpsChatOps: The New Interface of DevOps
ChatOps: The New Interface of DevOpsVictorOps
 
6 Steps to Creating a Minimum Viable Runbook Infographic
6 Steps to Creating a Minimum Viable Runbook Infographic6 Steps to Creating a Minimum Viable Runbook Infographic
6 Steps to Creating a Minimum Viable Runbook InfographicVictorOps
 
Incident Lifecycle Infographic
Incident Lifecycle InfographicIncident Lifecycle Infographic
Incident Lifecycle InfographicVictorOps
 
Crisis Management & Why It's Important Infographic
Crisis Management & Why It's Important InfographicCrisis Management & Why It's Important Infographic
Crisis Management & Why It's Important InfographicVictorOps
 
Real World ChatOps
Real World ChatOpsReal World ChatOps
Real World ChatOpsVictorOps
 
DevOps Culture Shift: Expanding On-Call Responsibilties
DevOps Culture Shift: Expanding On-Call ResponsibiltiesDevOps Culture Shift: Expanding On-Call Responsibilties
DevOps Culture Shift: Expanding On-Call ResponsibiltiesVictorOps
 
Tips & Tricks To Reducing TTR
Tips & Tricks To Reducing TTRTips & Tricks To Reducing TTR
Tips & Tricks To Reducing TTRVictorOps
 
The Open-Source Monitoring Landscape
The Open-Source Monitoring LandscapeThe Open-Source Monitoring Landscape
The Open-Source Monitoring LandscapeVictorOps
 
Actors: Not Just for Movies Anymore
Actors: Not Just for Movies AnymoreActors: Not Just for Movies Anymore
Actors: Not Just for Movies AnymoreVictorOps
 
An Introduction to Rearview - Time Series Based Monitoring
An Introduction to Rearview - Time Series Based MonitoringAn Introduction to Rearview - Time Series Based Monitoring
An Introduction to Rearview - Time Series Based MonitoringVictorOps
 
Putting Devs On-Call: How to Empower Your Team
Putting Devs On-Call: How to Empower Your TeamPutting Devs On-Call: How to Empower Your Team
Putting Devs On-Call: How to Empower Your TeamVictorOps
 
The Art & Zen of Managing Nagios with Puppet
The Art & Zen of Managing Nagios with PuppetThe Art & Zen of Managing Nagios with Puppet
The Art & Zen of Managing Nagios with PuppetVictorOps
 
ChatOps Unplugged
ChatOps UnpluggedChatOps Unplugged
ChatOps UnpluggedVictorOps
 
Post-mortem Fail
Post-mortem FailPost-mortem Fail
Post-mortem FailVictorOps
 

Plus de VictorOps (20)

DevOps Roadtrip Final Speaking Deck
DevOps Roadtrip Final Speaking Deck DevOps Roadtrip Final Speaking Deck
DevOps Roadtrip Final Speaking Deck
 
DevOps: A Practical Guide
DevOps: A Practical GuideDevOps: A Practical Guide
DevOps: A Practical Guide
 
Crisis Communication Webinar
Crisis Communication WebinarCrisis Communication Webinar
Crisis Communication Webinar
 
The Importance of Minimum Viable Runbooks Webinar
The Importance of Minimum Viable Runbooks WebinarThe Importance of Minimum Viable Runbooks Webinar
The Importance of Minimum Viable Runbooks Webinar
 
DevOps Roadtrip - Denver
DevOps Roadtrip - DenverDevOps Roadtrip - Denver
DevOps Roadtrip - Denver
 
VictorOps & Raygun: A Stunning Integration
VictorOps & Raygun: A Stunning IntegrationVictorOps & Raygun: A Stunning Integration
VictorOps & Raygun: A Stunning Integration
 
ChatOps: The New Interface of DevOps
ChatOps: The New Interface of DevOpsChatOps: The New Interface of DevOps
ChatOps: The New Interface of DevOps
 
6 Steps to Creating a Minimum Viable Runbook Infographic
6 Steps to Creating a Minimum Viable Runbook Infographic6 Steps to Creating a Minimum Viable Runbook Infographic
6 Steps to Creating a Minimum Viable Runbook Infographic
 
Incident Lifecycle Infographic
Incident Lifecycle InfographicIncident Lifecycle Infographic
Incident Lifecycle Infographic
 
Crisis Management & Why It's Important Infographic
Crisis Management & Why It's Important InfographicCrisis Management & Why It's Important Infographic
Crisis Management & Why It's Important Infographic
 
Real World ChatOps
Real World ChatOpsReal World ChatOps
Real World ChatOps
 
DevOps Culture Shift: Expanding On-Call Responsibilties
DevOps Culture Shift: Expanding On-Call ResponsibiltiesDevOps Culture Shift: Expanding On-Call Responsibilties
DevOps Culture Shift: Expanding On-Call Responsibilties
 
Tips & Tricks To Reducing TTR
Tips & Tricks To Reducing TTRTips & Tricks To Reducing TTR
Tips & Tricks To Reducing TTR
 
The Open-Source Monitoring Landscape
The Open-Source Monitoring LandscapeThe Open-Source Monitoring Landscape
The Open-Source Monitoring Landscape
 
Actors: Not Just for Movies Anymore
Actors: Not Just for Movies AnymoreActors: Not Just for Movies Anymore
Actors: Not Just for Movies Anymore
 
An Introduction to Rearview - Time Series Based Monitoring
An Introduction to Rearview - Time Series Based MonitoringAn Introduction to Rearview - Time Series Based Monitoring
An Introduction to Rearview - Time Series Based Monitoring
 
Putting Devs On-Call: How to Empower Your Team
Putting Devs On-Call: How to Empower Your TeamPutting Devs On-Call: How to Empower Your Team
Putting Devs On-Call: How to Empower Your Team
 
The Art & Zen of Managing Nagios with Puppet
The Art & Zen of Managing Nagios with PuppetThe Art & Zen of Managing Nagios with Puppet
The Art & Zen of Managing Nagios with Puppet
 
ChatOps Unplugged
ChatOps UnpluggedChatOps Unplugged
ChatOps Unplugged
 
Post-mortem Fail
Post-mortem FailPost-mortem Fail
Post-mortem Fail
 

Dernier

定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationMarko4394
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleanscorenetworkseo
 

Dernier (20)

定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptx
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentation
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleans
 

WEBINAR: VictorOps Blameless Post-Mortems

  • 2. Jason Hand DevOps “Handyman” jason@VictorOps.com @jasonhand @GoVictorOps #VOwebinar @jasonhand
  • 3. Tara Calihman Social Media Marketing Director @GoVictorOps #VOwebinar @Tarable
  • 4. A little about me… Dir. of Platform Support - AppDirect Dir. of Technical Support - Standing Cloud Dir. of Operational Systems - AFI Supply Hiker, climber, brewer, runner, biker, boarder, surfer, painter, singer, reader, writer, picker, coder, racer, camper, volunteer …. all the usual “Colorado 1-upper” @GoVictorOps #VOwebinar @jasonhand
  • 5. Alternative names Also known as: (Note: Public & Internal) Project Retrospectives Post-mortem analysis Post-project review Quality Improvement Review Project Analysis Review After Action Review Autopsy Review Santayana Review Touchdown Meeting @GoVictorOps #VOwebinar @jasonhand Learning Review
  • 6. Post-mortem Defined What ? A process intended to inform improvements by determining aspects that were successful or unsuccessful. @GoVictorOps #VOwebinar @jasonhand
  • 7. Post-mortem Defined When ? As soon as feasible after the Incident is resolved. @GoVictorOps #VOwebinar @jasonhand
  • 8. Post-mortem Poll Who should be involved in a blameless post-mortem? A. Management B. The Dev & Ops teams C. Only those that played a part in the outage & resolution D. All of the above @GoVictorOps #VOwebinar @jasonhand
  • 9. Post-mortem Answer D. All of the above i.e. Everybody Who ? @GoVictorOps #VOwebinar @jasonhand
  • 10. Post-mortem Defined To communicate with your team Why ? To understand what happened for learning and improving @GoVictorOps #VOwebinar @jasonhand
  • 11. Post-mortem Defined Talk about the incident timeline Escalation steps What was done to resolve the problem Create a remediation plan Make it available How ? @GoVictorOps #VOwebinar @jasonhand
  • 12. The Three R’s Regret Acknowledgement and apology Reason Initial incident detection to resolution, including the so-called “root causes.” Remedy Actionable remediation items Dave Zwieback VP Engineering - Next Big Sound @GoVictorOps #VOwebinar @jasonhand ( simple format )
  • 13. Moving from Reaction to Action (Remedy) Use SMART recommendations Specific Measurable Agreed Upon/Agreeable Realistic Timebound @GoVictorOps #VOwebinar @jasonhand
  • 14. Blameless @GoVictorOps #VOwebinar image from “Across the Universe” #VOwebinar
  • 15. Cool story, bro 2011 - Hired to Standing Cloud Cloud marketplace & automated deployment of apps Build Support team Provide Managed services @GoVictorOps #VOwebinar @jasonhand
  • 16. – Sydney Dekker “Reprimanding bad apples may seem like a quick and rewarding fix, but it’s like peeing in your pants. You feel relieved and perhaps even nice and warm for a little while, but then it gets cold and uncomfortable. And you look like a fool” @GoVictorOps #VOwebinar @jasonhand
  • 17. What is a blameless post-mortem? Team members are accountable but not responsible Complete Transparency Deeper look at circumstances What happened and how to improve it (specific details) Real conditions of failure in complex systems Avoid counterfactuals @GoVictorOps #VOwebinar @jasonhand
  • 18. “Your organization must continually affirm that individuals are NEVER the “root cause” of outages.” – Dave Zwieback @GoVictorOps #VOwebinar @jasonhand
  • 19. Why vs How @GoVictorOps #VOwebinar
  • 20. Paraphrased from “Fallible Humans” by Ian Malpass - DevOpsDays - Minneapolis source: http://www.indecorous.com/fallible_humans/ @GoVictorOps #VOwebinar @jasonhand
  • 21. ETTO (Efficiency Thoroughness Trade Off) The trade off between: being efficient vs being thorough Efficient Thorough @GoVictorOps #VOwebinar @jasonhand
  • 22. “We can be thorough and really dig into the task at hand and understand it well but this takes time: It is inefficient.” - Ian Malpass @GoVictorOps #VOwebinar @jasonhand
  • 23. Cause & Effect source: http://xkcd.com There are many factors that played a part in the problem “may be” @GoVictorOps #VOwebinar @jasonhand
  • 24. How many times does the letter “F” appear in the following sentence? Finished files are the re-sult @GoVictorOps #VOwebinar @jasonhand of years of scientific study combined with the experience of many years.
  • 25. How many times do you see the letter F @GoVictorOps #VOwebinar @jasonhand A. 3 B. 4 C. 5 D. 6
  • 26. @GoVictorOps #VOwebinar @jasonhand Answer: 6 Cognitive Bias
  • 27. Stress & Cognitive Bias @GoVictorOps #VOwebinar @jasonhand
  • 28. @GoVictorOps #VOwebinar @jasonhand Is stress good or bad? A.Good B.Bad
  • 29. Yerkes-Dodson Model @GoVictorOps #VOwebinar source: The Human Side of Postmortems @jasonhand
  • 30. @GoVictorOps #VOwebinar @jasonhand Is stress good or bad? Answer: Both
  • 31. Reduce Stress? … build muscle memory Simulate many types of problems and outages as “practice” … @GoVictorOps #VOwebinar @jasonhand #VOwebinar
  • 32. Evaluative Threat Being negatively judged plays a big role in stress @GoVictorOps #VOwebinar @jasonhand #VOwebinar
  • 33. What is stress surface? Variables of a situation Novel or unusual Unpredictable Controllable situation Negative judgement Relationships Health Problems at home Lack of sleep @GoVictorOps #VOwebinar @jasonhand Evaluative threats ALSO Etc…
  • 34. Capturing the Human-side Ask questions @GoVictorOps #VOwebinar @jasonhand
  • 35. Stress Questionnaire 0 = Never 1 = Almost Never 2 = Sometimes 3 = Fairly Often 4 = Very Often During the outage, how often have you felt or thought that: The situation was novel or unusual? The situation was unpredictable? You were unable to control the situation? Others could judge your actions negatively? @GoVictorOps #VOwebinar @jasonhand
  • 36. Why we DON’T punish De-incentivized to give the details Practically guarantees a repeat of the problem Understand why actions made sense (at the time) Create safety AND accountability Move away from idea of “individuals are problems” Create new “experts” @GoVictorOps #VOwebinar @jasonhand
  • 38. Promoting from within Where do we start? The basics: • Document your timeline or log data • Document conversations • Leave room for notes • Mean time to resolution / Time calculations • Level of severity • Archive it for historical retrieval • Remediation. Make it actionable @GoVictorOps #VOwebinar @jasonhand
  • 39. Tools Etsy’s Morgue VictorOps Post-mortem Report @GoVictorOps #VOwebinar @jasonhand Internal Wiki
  • 40. VictorOps Post-Mortem Report Etsy’s Morgue Tool @GoVictorOps #VOwebinar
  • 41. Seek the truth Don’t blame others … Don’t blame yourself Thank You @GoVictorOps #VOwebinar @jasonhand
  • 42. Questions ? @GoVictorOps #VOwebinar @jasonhand
  • 43. Next Webinar: ChatOps Unplugged http://victorops.com/chatops-webinar Try VictorOps! @GoVictorOps #VOwebinar @jasonhand Now What? See for yourself how we're solving the problem of post-mortem reporting, on-call scheduling, alert management and remote collaboration. Start your free trial or join our weekly product demo at www.victorops.com Post-mortem Guides
  • 44. Resources “The Human Side of Postmortems” - Dave Zwieback “The Field Guide to Understanding Human Error” - Sydney Dekker “A Look at Looking in the Mirror” - J. Paul Reed “Fallible Humans” - Ian Malpass (http://www.indecorous.com/fallible_humans/) “4 Questions to ask for an effective Technical Post Mortem” - Jeffrey O’Brien (http://www.maintenanceassistant.com/blog/4-questions-effective-technical-post-mortem/) “Nine steps to IT post-mortem excellence” - Michael Krigsman (http://www.zdnet.com/blog/projectfailures/nine-steps- to-it-post-mortem-excellence/1069) “Postmortem reviews: purpose and approaches in software engineering” - Torgeir Dingsøyr (http://www.uio.no/studier/emner/matnat/ifi/INF5180/v10/undervisningsmateriale/reading-materials/p08/post-mortems. pdf) “Blameless PostMortems and a Just Culture” - John Allspaw (http://codeascraft.com/2012/05/22/blameless-postmortems/) “What blameless really means” - Jessica Harllee (http://www.jessicaharllee.com/notes/what-blameless-really-means/) “Each necessary, but only jointly sufficient” - John Allspaw (http://www.kitchensoap.com/2012/02/10/each-necessary- @GoVictorOps but-only-jointly-sufficient/) #VOwebinar @jasonhand

Notes de l'éditeur

  1. Good afternoon everyone. Thank you for attending today’s webinar As you know, I’ll be presenting on the subject of Post-mortems (specifically… blameless)
  2. I’ll have a Q&A towards of the end of the presentation, but please feel free to reach out to me any time after… Here are a few ways to connect with me.
  3. Good afternoon everyone. Thank you for joining us for today’s webinar – I’m Tara. I’m a Pisces that enjoys long hikes on deserted trails and… I’m honored to introcude our DevOps evangelist, Jason Hand, who will be presenting on the subject of Post-mortems (specifically… blameless)
  4. Dir. of Platform Support - AppDirect Dir. of Tech Support - Standing Cloud Dir. of Operational Systems - AFI Supply … where I started my professional career.
  5. Public vs Internal A post-mortem exists in many different formats across all industries. They can be commonly referred to as:
  6. Everyone here has a pretty good idea of what a post-mortem is. Let’s QUICKLY review. First of all. It’s totally common for organizations to hold post-mortems after a successful event. What we are talking about today are post-mortems related to an outage, so that we can focus on the idea of “blameless”. Definition beginning: - What happened (in detail) … the good … the bad … all of it.
  7. It’s important that everyone involved .. be fully recovered. Take a step back and get “some” rest. … but you don’t want to wait so long that important details begin to fade.
  8. Who should be involved in a blameless post-mortem?
  9. Ideally, the entire team takes part in the post-mortem. If that’s not possible, then you should have all team members that played a part in the outage and resolution.. as well as any senior people.. or other vital teams that need to know all the details of exactly what happened. Not specific enough? - Introduced the problem - Identified the problem - Responded to the problem - Debugged the problem Anyone else that is interested Keep in mind … we are participating … to LEARN
  10. We want to know what happened in as much detail as possible so we can learn and improve our systems and processes. …Once you begin leaning towards complete transparency .. you’re on your way towards a truly blameless post-mortem.
  11. Here’s a general suggestion on the “How” of a post-mortem Mention Dave Zwieback’s book “The Human Side of Post-mortems”
  12. Regret - an acknowledgement of the impact of the outage and an apology. (usually customer facing) Reason - a linear outage timeline.. from initial incident detection to resolution, including the so-called “root causes.” - Notice that he says “so called” root causes. More on that later. Remedy - a list of remediation items to ensure that this particular outage won’t repeat. - Let’s talk about Remedy for a minute as the others are pretty straight forward.
  13. - Are you using the SMART method? - Are you entering a JIRA ticket? How are you following up with real “ACTION”? - Those are the basics and overview of post-mortems
  14. Now, let’s focus on the blameless aspect.
  15. Let’s start with a story - Standing Cloud was a cloud marketplace for automated deployment of we apps. I was brought on to build a support program and Provide basic managed services for customers. A new role for me .. AND my first startup. VERY exciting times for me. Tell my post-mortem story on losing customer data and how I felt like the “bad apple”
  16. Give audience a moment to read the quote. - Earlier this summer I attended Velocity - Santa Clara and one of the presentations I caught was titled: “A Look at Looking in the Mirror” by J. Paul Reed. This quote was included in his presentation but I liked it so much I had to include it as well. Blameless port-mortems. What is it exactly?
  17. - Blameless post-mortems means, team members should be accountable but not responsible. This is a gray area. Define better. - Transparency; Be open and honest about what took place - What was the larger set of circumstances that caused the “incident” or “outage”? - The purpose of your post-mortem is not to put blame on anyone on the team, the purpose is to figure out what happened and how to improve it. Focus more on “How” rather than “Why” The idea of a blameless post-mortem stems from the understanding that the real conditions of failure in the complex systems (that all of us are likely building or striving to build) are VERY real and play a HUGE role in how we approach an emergency situation. Counterfactuals are NOT allowed. Talking about something that did not happen. Not useful or allowed in postmortem. It is stating what did not happen.  Words like “should” (i.e. .. you should have seen this problem) .. 
  18. This is my favorite quote from Dave’s book. (That I mentioned earlier) - Earlier he mentioned “so called” root cause. People .. Humans .. Individuals … are never it! - Searching for a root cause is a dead end. There isn’t a single root cause! - You’re not going to find the root cause of a failure any easier than you’ll find the root cause of a success. If you think back to my unfortunate example, I was lucky to be part of a team that got that idea. They never blamed me. They never pointed fingers at someone who didn’t test something correctly. They simply looked for ways to improve. Improve the system. Improve the product. Improve the business.
  19. • Instead of asking why .. ask how. • Why; insinuates you are looking for a root cause. • How; brings us to the conditions that allowed the event to take place to begin with. Quote: “Cause is not something found in the rubble.  Cause is created in the minds of the investigators” - Sydney Dekker
  20. Where do you stand on the following question? Is stress good or bad? Good Bad As a young athlete and musician, most of my “break out” performances were under high stress situations. Do you ever feel like you play at your competition’s level (sometimes)? I believe much of that has to do with stress. I finally found time to sit down and work on this presentation just this past weekend. The stress of the deadline forced me to pull together.
  21. That was a trick question. Stress can be good … up to a point. This diagram is extremely interesting … although I feel like I’ve known it for years. - Stress can sometimes be good. In fact, many claim that they work better under stress. - You can see in the diagram that simple tasks are much more resilient to the effects of stress than complex ones. Simple tasks would be things that are well-learned, practiced, and performed with little-to-no effort. Complex tasks will be more unpredictable, or a feeling of a lack of control over the situation.
  22. If any of you have ever participated in competitive sports or games.. Do you ever feel like you play at your competition’s level (sometimes)? I believe much of that has to do with stress. I finally found time to sit down and work on this presentation just this past weekend. The stress of the deadline forced me to pull together.
  23. Should we seek methods to reduce stress, despite the Yerkes-Dobson model shows us? No. We manage it. - Netflix’s Chaos Monkey (part of their Simian Army) to simulate outages. - Develop a muscle-memory of how to deal with problems which can reduce stress levels when charged with dealing with actual outages you are already familiar with. ** Like fire drills in school. - Compare this to a guitar and the muscle memory that is developed over lots and lots of practice. You eventually just get to where your hands and fingers are doing all of the work with very little effort from your brain. - All of this indicates that you SHOULDN’T make an effort to eliminate stress, but rather manage it and use it as a tool. BUT don’t lose site that it CAN and DOES play a role in outages. And keep in mind: Reducing the impact of stress through practice and developing this “muscle memory” doesn’t address the “evaluative threat”.
  24. “Evaluative threat”. What’s that? An example is the finger pointing and blaming of outages to specific people or teams. Organizations where postmortems are far from blameless and where being “the root cause” of an outage could result in a demotion or getting fired… creates larger stress surfaces. .. What kind of impact does that have on your team .. and as a result .. your product .. and business? How do we address these stress surface?
  25. What are stress surfaces? To a certain degree, it’s the evaluative threats we just mentioned. ALSO But it’s also stress as a whole when you look at situations … due to many factors Such as … … all of these make up the stress surface.
  26. Advance slide
  27. Some companies will issue a questionnaire immediately after an outage to measure stress levels during the incident. - Team members are asked a series of questions independently from each other to avoid group think. - This is all rooted in the understanding that real conditions of failure in complex systems exist.. and finding ways to improve performance during outages can only be achieved by reducing their stress surface. - Create an environment where there is no fear of punishment
  28. - De-incentivizes everyone to give the details necessary to get an understanding of what actually took place. - Lack of understanding of how the accident occurred pretty much guarantees that it will repeat. If not with the original person, certainly someone else in the future. Because the facts weren’t allowed to surface. - It made sense to take that action (at that time) … why? - We want a culture where team members make an effort to find balance between safety AND accountability - Get away from idea that individuals, not situations, cause errors. We create a situation where people who do make mistakes become experts on it and can educate the rest of the team on how not to make them in the future. Think of it in terms of offering criminals immunity. We do it to get more information rather than just punish and stop the flow of information.
  29. - Becoming more accepting of failure at an organization level isn’t a new concept. I didn’t just come up with this hippie-dippie idea because I work for a startup in Boulder. M.J. here didn’t come up with it either. This also isn’t something that only really intuitive companies like tech startups are doing. We see it in all kinds of industries and companies of many sizes. Why? Because it works! As J. Paul Reed has said to me more than once… “There is science behind it”.
  30. What if your company isn’t doing post-mortems at all? Where do you start? Begin by documenting everything. … the log details, conversations, escalations .. all of it Have a place to keep notes Calculations on time (mean time to resolution) What was the severity of it? Save it somewhere with easy access Enter JIRA tickets
  31. - I know several of you use VictorOps so you are likely aware of the post-mortem report tool. - Those who aren’t should check out Etsy’s “Morgue” which is available as open source on Github. - Even if it’s an internal wiki, it’s important to use some sort of tool to build and store your post-mortem. Leave you with one final thought.