The document discusses strategies for improving the experience of engineers who are on-call to address technical issues. It suggests focusing on ensuring alerts are meaningful by only sending ones where action can be taken, and building support structures like runbooks to document procedures and enable calling for backup. It also advocates compensating engineers for extra work, conducting blameless post-mortems to learn from issues, and shifting culture away from a hero mentality to emphasize teamwork. The overall goal is to help prevent burnout among on-call staff.
19. Aaron Aldrich - @CrayZeigh
IT SOUNDS
PLAUSIBLE ENOUGH
TONIGHT, BUT WAIT
UNTIL TOMORROW.
WAIT FOR THE
COMMON SENSE OF
THE MORNING.
HG Wells, The Time Machine
38. Aaron Aldrich - @CrayZeigh
SOFTWARE
DEVELOPMENT, IT TURNS
OUT, IS A TEAM SPORT…
AND WHAT’S WORSE,
ENCOURAGING THE HERO
MENTALITY LEADS TO
CORROSIVE DYSFUNCTION
IN SOFTWARE TEAMS.
Rob Mee, Pivotal Labs
41. Aaron Aldrich @CrayZeigh
@CageData
WHAT IS A BLAMELESS POSTMORTEM?
▸ Team members are accountable but not responsible
▸ Complete Transparency
▸ Deeper look at circumstances
▸ What happened and how to improve it (specific details)
▸ Real conditions of failure in complex systems
@jasonhand
http://www.slideshare.net/jhand2/its-not-your-fault-blameless-post-mortems