Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Chaos Engineering 101 by Russ Miles

1 039 vues

Publié le

An introductory talk on Chaos Engineering, featuring Chaos Toolkit and ChaosIQ that provides Chaos for Cloud Native Microservices

The live streamed video of the talk being given at WorldPay is available on Twitter: https://www.pscp.tv/w/1DXGyEzMrRWGM?t=9​

Publié dans : Logiciels
  • Soyez le premier à commenter

Chaos Engineering 101 by Russ Miles

  1. 1. Chaos Engineering 101 The Why, What, How and Who of 
 Chaos Engineering Russ Miles CEO, ChaosIQ.io or How and why you should start doing Chaos Engineering in your organisation today!
  2. 2. Who Am I?
  3. 3. Who Am I?
  4. 4. Who Am I?
  5. 5. But…
  6. 6. Who Am I really?
  7. 7. Who Am I really?
  8. 8. Who Am I really?
  9. 9. Who Am I really?
  10. 10. Does this make me insane?
  11. 11. Takeaway Point 1
  12. 12. “Chaos Engineering has NOTHING 
 to do with causing Chaos…”
  13. 13. How a motorcyclist thinks
  14. 14. Production…
  15. 15. Hates you.
  16. 16. Takeaway Point 2
  17. 17. Failure is EVERYWHERE
  18. 18. Types of Failure
  19. 19. Hardware
  20. 20. Functional
  21. 21. State Transmission
  22. 22. Latency
  23. 23. Resource Exhaustion
  24. 24. …and more.
  25. 25. “Microsoft Doesn’t Learn…”
  26. 26. But it’s far from just Microsoft…
  27. 27. Takeaway Point 3
  28. 28. Chaos Engineering is specifically about Availability
  29. 29. Availability 
 is what Matters
  30. 30. Time Based Availability
  31. 31. Time Based Availability (uptime / uptime + downtime)
  32. 32. Aggregate Availability
  33. 33. Aggregate Availability (successful requests / 
 total requests)
  34. 34. Measured from the outside-in
  35. 35. But Availability is tricky…
  36. 36. LOTS of Factors affect it
  37. 37. Levels ChaosIQ.io Infrastructure Platform Applications People, Practices & Process
  38. 38. You are here
  39. 39. Microservices & Cloud Native?
  40. 40. You are here
  41. 41. Building cloud native microservices…
  42. 42. … that evolve quickly …
  43. 43. (Which was the point)
  44. 44. … then throw in Constant Change!
  45. 45. You are here!
  46. 46. Introducing Chaos Engineering
  47. 47. Takeaway Point 4
  48. 48. “The Chaos Engineering movement is as revolutionary to our industry that[sic] Extreme Programming was…” - @cunningleah ChaosIQ.io
  49. 49. Chaos Engineering exists to build Confidence in Availability in your rapidly evolving (i.e. chaotic), cloud native/microservices-based systems
  50. 50. Takeaway Point 5
  51. 51. How to DO Chaos Engineering? ChaosIQ.io
  52. 52. Build a Hypothesis around Steady State Behaviour ChaosIQ.io
  53. 53. Automated Canaries Point to Candidate Variables for 
 Steady State
  54. 54. Limit Blast Radius
  55. 55. Vary Real-world Events
  56. 56. Run Experiments in Production (ideally)
  57. 57. Automate Experiments to Run Continuously
  58. 58. Takeaway Point 6
  59. 59. Chaos Engineering affects…
  60. 60. Many Levels ChaosIQ.io Infrastructure Platform Applications People, Practices & Process
  61. 61. Takeaway Point 7
  62. 62. Rules of 
 Chaos “Club”
  63. 63. Rule 1: It’s about “Learning” ChaosIQ.io
  64. 64. Rule 2: Chaos is not a surprise ChaosIQ.io
  65. 65. Rule 3: ChaosIQ.io If you know the consequences, don’t do the experiment
  66. 66. Some Chaos FAQ ChaosIQ.io
  67. 67. Isn’t this just “Testing”?
  68. 68. Open Ended Questions
  69. 69. Hypothesis, 
 not Verification
  70. 70. Discovery, not Verification
  71. 71. Chaos is Not just 
 Single-loop Learning https://en.wikipedia.org/wiki/Double-loop_learning
  72. 72. Chaos Engineering is Double-loop Learning https://en.wikipedia.org/wiki/Double-loop_learning
  73. 73. Chaos Engineering in an organisation?
  74. 74. Chaos in a Organisation Infrastructure Platform Applications People, Practices & Process
  75. 75. Chaos in a Organisation Security Infrastructure Platform Applications People, Practices & Process
  76. 76. Chaos in a Organisation Security Infrastructure Platform Applications People, Practices & Process Chaos
  77. 77. Chaos Security Infrastructure Platform Applications People, Practices & Process Chaos in a Organisation Game Days
  78. 78. Chaos Security Infrastructure Platform Applications People, Practices & Process Chaos in a Organisation Game Days Automated Experiments
  79. 79. Chaos Security Infrastructure Platform Applications People, Practices & Process Chaos in a Organisation Game Days Automated Experiments Automated Experiments
  80. 80. Chaos Security Infrastructure Platform Applications People, Practices & Process Chaos in a Organisation Game Days Automated Experiments Automated Experiments Automated Experiments
  81. 81. Hard-Earned Tips when Adopting Chaos
  82. 82. Drop the Term!
  83. 83. “Can’t possibly do that here, we’re a …”
  84. 84. Communicate and Limit 
 Blast Radius
  85. 85. Grow the Capability
  86. 86. Starts as 
 Community of Practice
  87. 87. Collective Game Days
  88. 88. Community of Interest
  89. 89. Chaos-specific Supporting Group
  90. 90. Enterprise Challenges with Chaos
  91. 91. Game Days for Awareness and Immediate Improvement
  92. 92. Collaborative Planning
  93. 93. Collaborative Scheduling
  94. 94. Get Approval!
  95. 95. Execution Awareness
  96. 96. Share Results & Insights
  97. 97. Integrations with existing, custom systems
  98. 98. “Local” Chaos?
  99. 99. Briefly Introducing ChaosIQ.io
  100. 100. Builds on the open source Chaos Toolkit www.chaostoolkit.org
  101. 101. Collaborative Planning
  102. 102. Scheduling
  103. 103. Approval Workflow
  104. 104. Execution Awareness
  105. 105. Results Reports
  106. 106. ChaosIQ Insights Available Q1 2018
  107. 107. On-Premise and SaaS
  108. 108. Integrations with existing, custom systems
  109. 109. Who’s doing Chaos?
  110. 110. Takeaways Revisited
  111. 111. Chaos is not just for Netflix
  112. 112. “Chaos Engineering has NOTHING 
 to do with causing Chaos…”
  113. 113. Failure is EVERYWHERE…
  114. 114. …Learn from it!
  115. 115. Chaos is about Availability
  116. 116. What is Chaos Engineering? • Chaos Engineering is a Discipline that directly addresses System Availability and affects: • People & Culture • Applications • Platforms • Infrastructure ChaosIQ.io
  117. 117. Principles and Process of Chaos Engineering • Build a Hypothesis around Steady State Behaviour • Limit Blast Radius • Vary Real-world Events • Run Experiments in Production (ideally) • Automate Experiments to Run Continuously ChaosIQ.io
  118. 118. If you’re running microservices in production…
  119. 119. Grab the Book!
  120. 120. Grab the Open Source Chaos Toolkit www.chaostoolkit.org
  121. 121. Thanks! ChaosIQ - Chaos for Cloud Native
  122. 122. Resources & Links • Principles of Chaos, http://principlesofchaos.org/ • Open Source Chaos Toolkit: https://github.com/chaostoolkit • ChaosIQ Enterprise Cloud Native Chaos Engineering: http://www.chaosiq.io/ • Course on Chaos Engineering: http://www.russmiles.com/fast-track-to-chaos-engineering.html • London Chaos Engineering Community Meetup, https://www.meetup.com/London-Chaos-Engineering-Community/ • “Report from the SNAFUcatchers Workshop on Coping With Complexity”, http://stella.report • Chaos Engineering Community Group, https://groups.google.com/forum/#!topic/chaos-community • Free Chaos Engineering eBook from O’Reilly Media, http://www.oreilly.com/webops-perf/free/chaos- engineering.csp • Chaos Engineering Community Slack, https://slofile.com/slack/chaosengineering • A story towards Chaos with Zalando: https://www.slideshare.net/RaffaeleDiFazio/fallacies-of-distributed-computing- with-kubernetes-on-aws • Geek on a Harley tour blog: https://medium.com/russmiles

×