There is no Root Cause - Emergent Behavior in Complex Systems

•Download as PPTX, PDF•

1 like•1,664 views

What went wrong? Why does this always happen? How can we ensure it Never Happens Again? For most of the internet age, engineering teams have focused on finding a cause of an outage. A belief existed, and persists, that all errors or behaviors can be traced back to a single causal entity. The Root Cause Analysis is conducted in service of finding that entity, and correcting it. By doing so, we have been taught, we prevent recurrence of the error in question. Much of RCA thinking comes from manufacturing and electrical systems, where simple causality can exist. An oft failing fuse is caused by poor wiring. In computing environments, there is rarely so simple a cause. Within even the simplest application nest dependencies, logic, bottlenecks, and inefficiency. By wrapping that application in an operating system, on a server, on a network, on the internet, managed by process, actioned by people we add enough complexity to force us to reconsider the Root Cause Analysis approach. Modern tools and practices, like DevOps, enable engineering teams to adopt significant complexity at relatively low operational cost. Once unthinkable, microservice architecture in a public cloud environment is now a common choice for new software projects. Consider, for a moment, the layers of complexity captured in that decision. Now consider how opaque the agents in those systems are to the operators (us). Emergence is a phenomenon whereby larger entities arise through interactions among smaller or simpler entities. In theory, complex systems exhibit highly unpredictable behavior, and generate surprising patterns. In practice, teams operating complex engineering systems always see deeply interrelated causality - a blend of people, process, and the systems themselves. So why do we still focus our after action analysis on a Single Cause? In this talk, we’ll explore these conflicting realities for incident management teams. Attendees will learn about differences between Root Cause Analysis, and more techniques like Postmortem. While this is a technical talk with examples of both simple and complex infrastructures, much time will be spent considering the impacts of people and process to those same systems. Attendees will leave with some actionable ideas to bring back to their teams to improve their own after action analysis activities. Speaker matthew-boeckman Matthew Boeckman Matthew is an 18 year veteran building infrastructure and leading engineering teams. Despite his heavy Ops background, Matthew has been a longtime friend of Developers and considers DevOps his primary passion and focus. Most recently VP of Infrastructure at Craftsy, Matthew now owns Dryas.io, a consulting practice focused on DevOps, Cloud adoption, and startup growth strategy.

Technology

There is No Root Cause
Emergent behavior in complex systems
Matthew Boeckman

Incident ingestion, scheduling, routing, escalations, chatops, transformation, reporting
Incident Management for DevOps teams
VictorOps

Root Cause Analysis
What went wrong?
Why did that happen?
Who was
responsible?
How can we
prevent this
from recurring?

It’s like a tree, but sideways
Fishbone!
People Process Pipeline
Code Systems Data
Something
Bad
Happened!

“3 tiers should be enough tiers for anybody” - some guy, probably
Simple Systems

“I guess there’s more tiers?” - that guy
Simple Systems

“We can easily identify the cause of faults in our digital offerings” - same guy
Simple systems

Let’s change once a year, then it will be easier to point fingers at Dev.
Deployment Schedules

● It took a long time to create requirements
● It took a long time to write software
● It took a very long time to deploy applications
● It took a really, really long time to test software
● Testing patches was hard
● Deploying patches was all or nothing
● Managing Hardware was an entire departments job
● Software and Hardware changes often required orchestration (that was hard)
Playing the long game
There were some good reasons

Root Cause = Static model, Binary Thinking
GOOD
Working
Expected
Certain
Understood
Responsible
Uptime
BAD
Broken
Problem
Disaster
Confused
Wrong
FAILURE

So many tiers
“I thought we agreed on 3 tiers?”

“... refers to the existence or formation of collective
behaviors — what parts of a system do together that they
would not do alone.”1
Emergence and Complex Systems
1 Bar-Yam Concepts: Emergence
Properties and behaviors of systems arise from both the
fine structures that compose those systems, and the
interrelationships between the systems’ discrete parts.

Subtlety and Nuance
Our shared reality
High Complexity + Dramatic Change Vectors
=
Emergent Behavior

Cynefin
● Created by Dave Snowden @snowded
● Originally for managing IBM Intellectual
Capital
● Draws on research in systems, complexity,
network and learning theories

Complicated
Sense
Analyze
Respond
*probable

Disorder
Reduce
Analyze
Iterate
*culture

Analysis changes the game
Knowledge
and practice
move patterns
towards more
favorable
quadrants.

Complacency erodes progress
Slacking off
walks back
progress

Adopting Cynefin
In the moment:
What Quadrant does this map to?
In the PIR:
How did we manage the pattern?
In your sprint planning:
What patterns can we manage
clockwise?

Root Cause Analysis Cynefin
Simple Causality
Static Model
Binary Thinking
After-Action
Focus on Blame
Dynamic
Expects Change
Embraces Emergence
Present in the Moment
Call to Action

Subtlety and Nuance
Binary no more
There is no broken.

“Uncertainty is an
uncomfortable position.
But certainty is an
absurd one.”
-Voltaire

Recently uploaded

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Advanced Computer Architecture – An IntroductionDilum Bandara

From Family Reminiscence to Scholarly Archive .Alan Dix

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

"ML in Production",Oleksandr BaganFwdays

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Rise of the Machines: Known As Drones...Rick Flair

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Gen AI in Business - Global Trends Report 2024.pdfAddepto

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

Recently uploaded (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

Unraveling Multimodality with Large Language Models.pdf

Advanced Computer Architecture – An Introduction

From Family Reminiscence to Scholarly Archive .

DevoxxFR 2024 Reproducible Builds with Apache Maven

"ML in Production",Oleksandr Bagan

DSPy a system for AI to Write Prompts and Do Fine Tuning

Moving Beyond Passwords: FIDO Paris Seminar.pdf

SIP trunking in Janus @ Kamailio World 2024

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx

Nell’iperspazio con Rocket: il Framework Web di Rust!

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

WordPress Websites for Engineers: Elevate Your Brand

Rise of the Machines: Known As Drones...

Developer Data Modeling Mistakes: From Postgres to NoSQL

Gen AI in Business - Global Trends Report 2024.pdf

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES

The Ultimate Guide to Choosing WordPress Pros and Cons

Featured

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Featured (20)

2024 State of Marketing Report – by Hubspot

Everything You Need To Know About ChatGPT

Product Design Trends in 2024 | Teenage Engineerings

How Race, Age and Gender Shape Attitudes Towards Mental Health

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

There is no Root Cause - Emergent Behavior in Complex Systems

1. There is No Root Cause Emergent behavior in complex systems Matthew Boeckman

2. Developer Advocate VictorOps Technology Strategist Dryas.io 18 years (dev)Ops @matthewboeckman Matthew Boeckman

3. Incident ingestion, scheduling, routing, escalations, chatops, transformation, reporting Incident Management for DevOps teams VictorOps

4. Root Cause Analysis What went wrong? Why did that happen? Who was responsible? How can we prevent this from recurring?

5. It’s like a tree, but sideways Fishbone! People Process Pipeline Code Systems Data Something Bad Happened!

6. “3 tiers should be enough tiers for anybody” - some guy, probably Simple Systems

7. “I guess there’s more tiers?” - that guy Simple Systems

8. “We can easily identify the cause of faults in our digital offerings” - same guy Simple systems

9. Let’s change once a year, then it will be easier to point fingers at Dev. Deployment Schedules

10. ● It took a long time to create requirements ● It took a long time to write software ● It took a very long time to deploy applications ● It took a really, really long time to test software ● Testing patches was hard ● Deploying patches was all or nothing ● Managing Hardware was an entire departments job ● Software and Hardware changes often required orchestration (that was hard) Playing the long game There were some good reasons

11. ● It took a long time to create requirements ● It took a long time to write software ● It took a very long time to deploy applications ● It took a really, really long time to test software ● Testing patches was hard ● Deploying patches was all or nothing ● Managing Hardware was an entire departments job ● Software and Hardware changes often required orchestration (that was hard) Playing the long game There aren’t anymore

12. Root Cause = Static model, Binary Thinking GOOD Working Expected Certain Understood Responsible Uptime BAD Broken Problem Disaster Confused Wrong FAILURE

13. “3 tiers should be enough tiers for anybody” - some guy, probably Simple Systems

14. So many tiers “I thought we agreed on 3 tiers?”

15. What’s traceability, precious?

16.

17. It’s not a tree...

18. … it’s a forest

19. “... refers to the existence or formation of collective behaviors — what parts of a system do together that they would not do alone.”1 Emergence and Complex Systems 1 Bar-Yam Concepts: Emergence Properties and behaviors of systems arise from both the fine structures that compose those systems, and the interrelationships between the systems’ discrete parts.

20. Root Cause Language

21. Emergence Language

22. Subtlety and Nuance Our shared reality High Complexity + Dramatic Change Vectors = Emergent Behavior

23. Are we doomed?

24. Cynefin

25. Cynefin ● Created by Dave Snowden @snowded ● Originally for managing IBM Intellectual Capital ● Draws on research in systems, complexity, network and learning theories

26.

27. Simple Sense Categorize Respond *known

28. Complicated Sense Analyze Respond *probable

29. Complex Probe Sense Respond *emergent

30. Chaotic Act Sense Respond *buckle up

31. Disorder Reduce Analyze Iterate *culture

32. Analysis changes the game Knowledge and practice move patterns towards more favorable quadrants.

33. Complacency erodes progress Slacking off walks back progress

34. Adopting Cynefin In the moment: What Quadrant does this map to? In the PIR: How did we manage the pattern? In your sprint planning: What patterns can we manage clockwise?

35. Root Cause Analysis Cynefin Simple Causality Static Model Binary Thinking After-Action Focus on Blame Dynamic Expects Change Embraces Emergence Present in the Moment Call to Action

36. Subtlety and Nuance Binary no more There is no broken.

37. “Uncertainty is an uncomfortable position. But certainty is an absurd one.” -Voltaire

38. Thank you! @matthewboeckman

Editor's Notes

Dave Snowden

There is no Root Cause - Emergent Behavior in Complex Systems

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

There is no Root Cause - Emergent Behavior in Complex Systems

Editor's Notes