Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
#PDSummit16#PDSummit16
The Journey of Chaos Engineering
Begins with a Single Step
#PDSummit16#PDSummit16
Bruce WongSenior Engineering Manager
Twilio
@bruce_m_wong
https://www.linkedin.com/in/brucemwong
#PDSummit16#PDSummit16
#PDSummit16#PDSummit16
2009
2012
2014
http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html
https://git...
#PDSummit16
http://readwrite.com/2014/09/17/netflix-chaos-engineering-for-everyone/
http://techblog.netflix.com/2014/09/in...
#PDSummit16
https://www.twilio.com/
#PDSummit16#PDSummit16
https://customers.twilio.com/
#PDSummit16#PDSummit16
The journey of a thousand
miles begins with a single
step.
-Lao Tzu
#PDSummit16#PDSummit16
James BurnsTech Lead
Twilio
@1mentat
#PDSummit16
https://www.linkedin.com/in/james-burns-7816a82
#PDSummit16#PDSummit16
Preparation
Pre-Launch Log Aggregation System
-Stage env
-Synthetic Traffic
#PDSummit16
The Master of Disaster
•Network Issues
•Partitions
•Thundering Herds
•Cascading Failures
•Resource Starvation
...
> sudo halt
#PDSummit16
Incident Start
#PDSummit16
Impact?
#PDSummit16
Post-Mortem
#PDSummit16
#PDSummit16#PDSummit16
#PDSummit16
Round 2
•Network Issues
•Partitions
•Thundering Herds
•Cascading Failures
•Resource Starvation
•CPU
•Memory
•D...
> sudo halt
#PDSummit16
Third-Party API Failure
#PDSummit16
#PDSummit16
Well, that’s not what
I expected to see
#PDSummit16
Outcomes
Instrument
Instrument
Instrument
API SLAs
Architectural
Change!
#PDSummit16
Recap
• Start Simple
• Instrumentation
Gaps
• Understand your
dashboards
• Prevent outages
#PDSummit16
http://www.crisistextline.org/
http://polarisproject.org/befree-textline
http://trekmedics.org/
https://www.tw...
#PDSummit16
When you wish upon a
blue moon…
#PDSummit16#PDSummit16
Please provide
feedback for this
session by filling out
the feedback survey
Prochain SlideShare
Chargement dans…5
×

The Journey of Chaos Engineering Begins with a Single Step

841 vues

Publié le

PagerDuty Summit 2016
Presenters: Bruce Wong, James Burns
https://www.pagerduty.com/pagerduty-summit-2016/

Heard of Netflix' Chaos Engineering & the Simian Army? Google's legendary DiRT exercises? Hear about how Twilio is getting started on its journey with Chaos Engineering. This talk is the story of how Twilio got started with Chaos Engineering, lessons learned, and the impact to our engineering culture.

Publié dans : Technologie
  • Soyez le premier à commenter

The Journey of Chaos Engineering Begins with a Single Step

  1. 1. #PDSummit16#PDSummit16 The Journey of Chaos Engineering Begins with a Single Step
  2. 2. #PDSummit16#PDSummit16 Bruce WongSenior Engineering Manager Twilio @bruce_m_wong https://www.linkedin.com/in/brucemwong
  3. 3. #PDSummit16#PDSummit16
  4. 4. #PDSummit16#PDSummit16 2009 2012 2014 http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html https://github.com/Netflix/SimianArmy http://techblog.netflix.com/2015/09/chaos-engineering-upgraded.html
  5. 5. #PDSummit16 http://readwrite.com/2014/09/17/netflix-chaos-engineering-for-everyone/ http://techblog.netflix.com/2014/09/introducing-chaos-engineering.html
  6. 6. #PDSummit16 https://www.twilio.com/
  7. 7. #PDSummit16#PDSummit16 https://customers.twilio.com/
  8. 8. #PDSummit16#PDSummit16 The journey of a thousand miles begins with a single step. -Lao Tzu
  9. 9. #PDSummit16#PDSummit16 James BurnsTech Lead Twilio @1mentat #PDSummit16 https://www.linkedin.com/in/james-burns-7816a82
  10. 10. #PDSummit16#PDSummit16 Preparation Pre-Launch Log Aggregation System -Stage env -Synthetic Traffic
  11. 11. #PDSummit16 The Master of Disaster •Network Issues •Partitions •Thundering Herds •Cascading Failures •Resource Starvation •CPU •Memory •Disk IO •Network IO •Application Load
  12. 12. > sudo halt #PDSummit16
  13. 13. Incident Start #PDSummit16
  14. 14. Impact? #PDSummit16
  15. 15. Post-Mortem #PDSummit16
  16. 16. #PDSummit16#PDSummit16
  17. 17. #PDSummit16 Round 2 •Network Issues •Partitions •Thundering Herds •Cascading Failures •Resource Starvation •CPU •Memory •Disk IO •Network IO •Application Load
  18. 18. > sudo halt #PDSummit16
  19. 19. Third-Party API Failure #PDSummit16
  20. 20. #PDSummit16 Well, that’s not what I expected to see
  21. 21. #PDSummit16 Outcomes Instrument Instrument Instrument API SLAs Architectural Change!
  22. 22. #PDSummit16 Recap • Start Simple • Instrumentation Gaps • Understand your dashboards • Prevent outages
  23. 23. #PDSummit16 http://www.crisistextline.org/ http://polarisproject.org/befree-textline http://trekmedics.org/ https://www.twilio.org/
  24. 24. #PDSummit16 When you wish upon a blue moon…
  25. 25. #PDSummit16#PDSummit16 Please provide feedback for this session by filling out the feedback survey

×