SlideShare une entreprise Scribd logo
1  sur  69
Télécharger pour lire hors ligne
Choose Your Own
Adventure:
Chaos Engineering
Nora Jones, Senior Chaos Engineer
@nora_js
In this talk
● Choosing your own adventure with Chaos
● Phases of Chaos
● Road to cultural acceptance
● Alternating between anecdotes and advice (when I do,
you’ll see a “Story” icon)
@nora_js
In this talk
● Choosing your own adventure with Chaos
● Phases of Chaos
● Road to cultural acceptance
● Alternating between anecdotes and advice (when I do,
you’ll see a “Story” icon)
@nora_js
Known ways of testing for
availability
● Unit Tests
● Regression Tests
● Integration Tests
● Chaos Engineering
“I want to emphasize that both
sides of the equation
[unit/regression/integration
testing side and Chaos side] are
required to get you the level of
availability you want.”
--Haley Tucker, Netflix
Chaos Engineering
You can’t keep blaming your cloud
provider
@nora_js
Why is there a fear of Chaos when it’s
inevitable?
Computers are
complicated and they will
break.
Meet “Chaos Carol”
Where is Carol starting her Chaos?
Phase 1: Introducing the Chaos
Start with a steady state
● Define “normal” system and business behavior for your
services
● Determine what the system architecture looks like at a high
level
@nora_js
Microservices
There isn’t always money in microservices
Randomly turn things
off?
Recreate things that
already happened?
Phase 1.1: Graceful
Restarts and
Degradation (start out
small)
Let people know?
Let the Chaos run
automatically?
Microservices
Working on Chaos experiments is a quick way to meet your
new colleagues. Do it tactfully.
Socialization
Socialization
● Tends to be harder than implementation.
● Part of one’s job as an engineer developing internal tools
is to understand your customer and their needs.
● Relate your Chaos experiments to automated tests, to
SLAs and ultimately, to the customer experience.
Culture & Chaos
Chaos doesn’t cause
problems, it reveals
them.
When your customers
are your coworkers.
Internal Tools: Selling 101
● Focus more on asking the questions, rather than
answering them.
● Find customers willing to try first. Then share their stories.
● Be honest. Don’t make false promises about what Chaos
will do.
Monitoring
Monitoring
● Leverage the tools you have.
● If you don’t monitor and measure the Chaos, how can you improve? And how
do you know it is working?
● Look at your incidents or JIRA tickets recently. Have they decreased from
when you started Chaos testing?
● Monitor culture around Chaos too. Has the idea of it improved? Are you
tracking adoption rates? Successes?
Don’t lose sight of your
company’s customers.
Strongly consider customer impact
with approaching your Chaos testing
and proceed with caution where
appropriate.
Phase 2: Can we cause a
cascading failure?
Cascading failures often lie dormant for
a long time until they are triggered by
an unusual set of circumstances.
Phase 3: Building a Failure
Injection Library
https://github.com/norajones/FailureInjectionLibrary
Types of Chaos Failures
Types of Chaos Failures
Criteria&API
Phase 4: Chaos
Automation Platform
“ChAP”
ChAP
● Designed to overcome the problems of FIT (failure
injection testing)
● Focused on minimizing blast radius
● Concentrates failures onto dedicated instances
● More: orchestration, segmentation, automation, and safety
@nora_js
ChAP
ChAP Goal: Chaos all the
things and run all the
time.
Phase 5: Targeted Chaos
Phase 5: Targeted Chaos
Kafka
Targeted Chaos: Kafka Problems
● Monitoring
● Dealing with offsets, especially during geo replication
efforts
● High consumer read levels
@nora_js
Targeted Chaos: Kafka Ideas
● Complete topic deletion
● Partial Topic Deletion
● Feeding the consumers bad offsets
● Random Packet Drops
● High Load on Topics
● Deleting segments, random and structured
@nora_js
It’s important to have a steady state with Targeted
Chaos before you begin.
Record Chaos Success Stories
(especially important during
adoption)
@nora_js
“We ran a chaos experiment which
verifies that our fallback path works
(crucial for our availability) and it
successfully caught a issue in the
fallback path and the issue was
resolved before it resulted in any
availability incident!”
“While [failing calls] we discovered an increase in
license requests for the experiment cluster even
though fallbacks were all successful. This likely
means that whoever was consuming the fallback
was retrying the call, causing an increase in
license requests.”
@nora_js
● Know your company’s culture.
● Set goals for each level of Chaos adoption
you expect.
● Define success criteria.
Engagement Guides
@nora_js
Should you develop
experiments for the
service teams?
Let them do it on their
own?
Takeaways
● Pervasive cultural patterns play out in
advocating for Chaos.
● There will be “adventure” choices you need to
make when choosing your Chaos.
● Measure your metrics for business and cultural
success.
@nora_js
Questions?
@nora_js
chaos@netflix.com

Contenu connexe

Tendances

Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Ana Medina
 

Tendances (20)

chaos-engineering-Knolx
chaos-engineering-Knolxchaos-engineering-Knolx
chaos-engineering-Knolx
 
Chaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient SystemsChaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient Systems
 
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
 
Chaos engineering
Chaos engineering Chaos engineering
Chaos engineering
 
Principles Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering HamburgPrinciples Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering Hamburg
 
Chaos Engineering for Docker
Chaos Engineering for DockerChaos Engineering for Docker
Chaos Engineering for Docker
 
Chaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field GuideChaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field Guide
 
Chaos engineering & Gameday on AWS
Chaos engineering & Gameday on AWSChaos engineering & Gameday on AWS
Chaos engineering & Gameday on AWS
 
Chaos Engineering with Gremlin Platform
Chaos Engineering with Gremlin PlatformChaos Engineering with Gremlin Platform
Chaos Engineering with Gremlin Platform
 
Running a High-Performance Kubernetes Cluster with Amazon EKS (CON318-R1) - A...
Running a High-Performance Kubernetes Cluster with Amazon EKS (CON318-R1) - A...Running a High-Performance Kubernetes Cluster with Amazon EKS (CON318-R1) - A...
Running a High-Performance Kubernetes Cluster with Amazon EKS (CON318-R1) - A...
 
Introduction to Chaos Engineering with Microsoft Azure
Introduction to Chaos Engineering with Microsoft AzureIntroduction to Chaos Engineering with Microsoft Azure
Introduction to Chaos Engineering with Microsoft Azure
 
Intro to AWS Developer Tools, featuring AWS CodeStar
Intro to AWS Developer Tools, featuring AWS CodeStarIntro to AWS Developer Tools, featuring AWS CodeStar
Intro to AWS Developer Tools, featuring AWS CodeStar
 
AWS CodeCommit, CodeDeploy & CodePipeline
AWS CodeCommit, CodeDeploy & CodePipelineAWS CodeCommit, CodeDeploy & CodePipeline
AWS CodeCommit, CodeDeploy & CodePipeline
 
POST/CON 2019 Workshop: Testing, Automated Testing, and Reporting APIs with P...
POST/CON 2019 Workshop: Testing, Automated Testing, and Reporting APIs with P...POST/CON 2019 Workshop: Testing, Automated Testing, and Reporting APIs with P...
POST/CON 2019 Workshop: Testing, Automated Testing, and Reporting APIs with P...
 
A Journey from Hexagonal Architecture to Event Sourcing - SymfonyCon Cluj 2017
A Journey from Hexagonal Architecture to Event Sourcing - SymfonyCon Cluj 2017A Journey from Hexagonal Architecture to Event Sourcing - SymfonyCon Cluj 2017
A Journey from Hexagonal Architecture to Event Sourcing - SymfonyCon Cluj 2017
 
Sonar Metrics
Sonar MetricsSonar Metrics
Sonar Metrics
 
CI/CD pipelines on AWS - Builders Day Israel
CI/CD pipelines on AWS - Builders Day IsraelCI/CD pipelines on AWS - Builders Day Israel
CI/CD pipelines on AWS - Builders Day Israel
 
Continuous Quality with Postman
Continuous Quality with PostmanContinuous Quality with Postman
Continuous Quality with Postman
 
Postman: An Introduction for Developers
Postman: An Introduction for DevelopersPostman: An Introduction for Developers
Postman: An Introduction for Developers
 
The Case for Chaos
The Case for ChaosThe Case for Chaos
The Case for Chaos
 

Similaire à Choose your own adventure Chaos Engineering - QCon NYC 2017

An Overview of automated testing (1)
An Overview of automated testing (1)An Overview of automated testing (1)
An Overview of automated testing (1)
Rodrigo Lopes
 
A Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software TestingA Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software Testing
TechWell
 
Agile testing for mere mortals
Agile testing for mere mortalsAgile testing for mere mortals
Agile testing for mere mortals
Dave Haeffner
 

Similaire à Choose your own adventure Chaos Engineering - QCon NYC 2017 (20)

Exploratory Testing Explained
Exploratory Testing ExplainedExploratory Testing Explained
Exploratory Testing Explained
 
Pivotal APJ Security Chaos Engineering
Pivotal APJ Security Chaos EngineeringPivotal APJ Security Chaos Engineering
Pivotal APJ Security Chaos Engineering
 
An Overview of automated testing (1)
An Overview of automated testing (1)An Overview of automated testing (1)
An Overview of automated testing (1)
 
RSA Conference APJ 2019 DevSecOps Days Security Chaos Engineering
RSA Conference APJ 2019 DevSecOps Days Security Chaos EngineeringRSA Conference APJ 2019 DevSecOps Days Security Chaos Engineering
RSA Conference APJ 2019 DevSecOps Days Security Chaos Engineering
 
Technical Excellence Doesn't Just Happen--Igniting a Craftsmanship Culture
Technical Excellence Doesn't Just Happen--Igniting a Craftsmanship CultureTechnical Excellence Doesn't Just Happen--Igniting a Craftsmanship Culture
Technical Excellence Doesn't Just Happen--Igniting a Craftsmanship Culture
 
A real-life overview of Agile and Scrum
A real-life overview of Agile and ScrumA real-life overview of Agile and Scrum
A real-life overview of Agile and Scrum
 
A Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software TestingA Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software Testing
 
A Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software TestingA Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software Testing
 
Software Defects and SW Reliability Assessment
Software Defects and SW Reliability AssessmentSoftware Defects and SW Reliability Assessment
Software Defects and SW Reliability Assessment
 
Testing in agile
Testing in agileTesting in agile
Testing in agile
 
Test case design techniques
Test case design techniquesTest case design techniques
Test case design techniques
 
Test case design techniques
Test case design techniquesTest case design techniques
Test case design techniques
 
Agile testing for mere mortals
Agile testing for mere mortalsAgile testing for mere mortals
Agile testing for mere mortals
 
A Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software TestingA Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software Testing
 
Scrum and-xp-from-the-trenches 01 intro & backlog
Scrum and-xp-from-the-trenches 01 intro & backlogScrum and-xp-from-the-trenches 01 intro & backlog
Scrum and-xp-from-the-trenches 01 intro & backlog
 
SRVision 2019, Utrecht: Swarming and Cynefin
SRVision 2019, Utrecht: Swarming and CynefinSRVision 2019, Utrecht: Swarming and Cynefin
SRVision 2019, Utrecht: Swarming and Cynefin
 
Exploratory testing workshop
Exploratory testing workshopExploratory testing workshop
Exploratory testing workshop
 
! Testing for agile teams
! Testing for agile teams! Testing for agile teams
! Testing for agile teams
 
Worst practices in software testing by the Testing troll
Worst practices in software testing by the Testing trollWorst practices in software testing by the Testing troll
Worst practices in software testing by the Testing troll
 
OWASP AppSec Global 2019 Security & Chaos Engineering
OWASP AppSec Global 2019 Security & Chaos EngineeringOWASP AppSec Global 2019 Security & Chaos Engineering
OWASP AppSec Global 2019 Security & Chaos Engineering
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Dernier (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Choose your own adventure Chaos Engineering - QCon NYC 2017