Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Reanimating DevOps to Build Things that Work

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 25 Publicité

Plus De Contenu Connexe

Similaire à Reanimating DevOps to Build Things that Work (20)

Plus par DevOpsDays Baltimore (20)

Publicité

Plus récents (20)

Reanimating DevOps to Build Things that Work

  1. 1. Reanimating DevOps DevOps has been about putting software engineering know-how into operations. Without the reverse, it is just continuously deployed carcasses. Theo Schlossnagle - @postwait Founder & CEO - @Circonus
  2. 2. DevOps a way for technology organizations to move faster with less risk Continuous Integration and Deployment A brief rant on the impedance between those. http://oncoscape.sttrcancer.org/ Theo Schlossnagle - @postwait Founder & CEO - @Circonus
  3. 3. Carcass may be a bit harsh • Shit breaks • All the time • Senior engineers know – the Internet is held together with string and hope • Because of this, CI/CD gives us tremendous power – to rapidly replace broken software in production – with other broken software or perhaps not Theo Schlossnagle - @postwait Founder & CEO - @Circonus
  4. 4. DevOps’ pompous assumption DevOps Software Engineers: • Operations people struggle because they don’t have the tooling to automate their jobs. We can help them! Operations People • We struggle because software engineers write software that is a fucking tire fire of failure and when it breaks in production we never know why. ¯_(ツ)_/¯ Theo Schlossnagle - @postwait Founder & CEO - @Circonus
  5. 5. Operations still needs observability • Stage 1: – Nothing • Stage 2: – Logs – Health checks (a.k.a. red-light, green-light ops) • Stage 3: – Passive metrics (analytical status reports) • Stage 4: – Dynamic tracing – Behavior models Observability Observability Observability Observability Observability Observability Observability Theo Schlossnagle - @postwait Founder & CEO - @Circonus
  6. 6. DTrace • DTrace – Instrumentation of single systems. – Seamlessly crossed user-space/kernel-space divide. – From user-space probes to hardware. – Simple awk-like interface. – Open source, free to consume. – Did not require cooperation. • eBPF – Finally provides plumbing on Linux. – Same breadth as DTrace. – No simple consumption tools - yet. Admit it. You still want it. Theo Schlossnagle - @postwait Founder & CEO - @Circonus
  7. 7. Real Behavior First a short story About the revolution in web monitoring That most forgot happened Theo Schlossnagle - @postwait Founder & CEO - @Circonus
  8. 8. Where is RUM for Systems? Theo Schlossnagle - @postwait Founder & CEO - @Circonus
  9. 9. Inconvenient realities • 2000: collecting data at human scale became possible. • 2017: – More humans and more devices per human – Internet ofThings is starting hypergrowth – Systems “do a lot” for each “user-facing” interaction (100,000x - 1,000,000x ; my speculation) • Extrinsic growth: 1 EB → 870 EB [1], 7% → 40% [2] • Intrinsic magnification: 100,000 – 1,000,000 • 10mm to 100mm (WAG) • Moore’s law – 2(2017−2000)/1.5 ≅ 2500 [1] https://en.wikipedia.org/wiki/Internet_traffic [2] https://en.wikipedia.org/wiki/Global_Internet_usage Theo Schlossnagle - @postwait Founder & CEO - @Circonus
  10. 10. Compromise SurgicalTracing • For some small bit of systems activity • Complete dimensionality • Useful for debugging and troubleshooting Generalized Behavior • For all systems activity • Very limited dimensionality • Robust understanding of behavior (and changes thereof) ` Theo Schlossnagle - @postwait Founder & CEO - @Circonus
  11. 11. To where? SurgicalTracing • eBPF will likely meet OpenTracing – (TBD user-tooling + zipkin + services) – Maybe honeycomb.io Generalized Behavior • Better user-level system instrumentation. • eBPF extraction of systems information • More scalable/economicalTSDBs – Circonus Theo Schlossnagle - @postwait Founder & CEO - @Circonus
  12. 12. Never undervalue grace in failure. Rule . 𝛌1 Crash landings should be both fast and controlled. Theo Schlossnagle - @postwait Founder & CEO - @Circonus
  13. 13. What it means to fail quickly & safely • The scope of failure should collapse completely. • The time to failure should be measured in small multiples of normal service time • Nothing outside the scope of failure should be impacted. https://www.youtube.com/watch?v=5SL1A2d2e7M
  14. 14. Autopsies: not just for medicine. Rule . 𝛌2 Post-mortems are fundamental. Theo Schlossnagle - @postwait Founder & CEO - @Circonus
  15. 15. Pragmatic analysis is required to understand failure’s true nature • Post-mortem analysis is critical • Stack traces • Forensic logs • Images (cores, dumps, etc.)
  16. 16. The difference between a shock and electrocution is real. Rule . 𝛌3 Use circuit breakers. Theo Schlossnagle - @postwait Founder & CEO - @Circonus
  17. 17. Circuit breakers are designed to avoid cascading failure • it’s not all about, especially with microservices • protect yourselves and others • circuit breakers of many type • timing • queue depth • concurrency http://melissaomarkham.com
  18. 18. You cannot understand what you cannot measure. Rule . 𝛌4 Behavior is complex. Understand it. Theo Schlossnagle - @postwait Founder & CEO - @Circonus
  19. 19. Don’t measure to assess availability measure to understand Build robust models of behavior Understand performance changes Don’t use averages Don’t use percentiles alone
  20. 20. Don’t measure to assess availability measure to understand Build robust models of behavior Understand performance changes Don’t use averages Don’t use percentiles alone
  21. 21. It’s easy to demand perfection; it’s also stupid. Rule . 𝛌5 Have an failure budget. Theo Schlossnagle - @postwait Founder & CEO - @Circonus
  22. 22. Avoid failure is simply impossible, expect and manage failure • use failure budgets • set expectations reasonably • define and reward successes on improvement and competency, not just uptime.
  23. 23. Justice should be blind; operations should not. Rule . 𝛌6 Instrumentation & Observability have no equals. Theo Schlossnagle - @postwait Founder & CEO - @Circonus
  24. 24. For every “I wonder what X is right now?” in production, you must have answers DTrace eBPF Instrument code for observability https://www.pinterest.com/pin/441775044670412234/
  25. 25. Thanks Software Engineers can deliver us observability… if they choose to. Theo Schlossnagle - @postwait Founder & CEO - @Circonus

×