Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

PID Loops and the Art of Keeping Systems Stable

13 vues

Publié le

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2m6VgNK.

Colm MacCárthaigh shows what PID loops look like in the context of modern systems, and how exponential backoff, flow-control, and other techniques can be wielded to build self-healing systems. Filmed at qconnewyork.com.

Colm MacCárthaigh is an engineer at Amazon Web Services. For just over ten years he has been building some of the largest services at AWS, including Amazon EC2, S3, ELB, CloudFront, and Route53. He is also an active Open Source contributor and is the main author of Amazon s2n, AWS's Open Source implementation of TLS/SSL.

Publié dans : Technologie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

PID Loops and the Art of Keeping Systems Stable

  1. 1. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. Colm MacCárthaigh PID loops and the art of keeping systems stable @colmmacc 2019-06-24
  2. 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ pid-loops/
  3. 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  4. 4. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. Control Theory: Where the fruit is hanging so low IT IS TOUCHING THE GROUND
  5. 5. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. META
  6. 6. © 2019, Amazon Web Services, Inc. or its Affiliates.
  7. 7. © 2019, Amazon Web Services, Inc. or its Affiliates.
  8. 8. © 2019, Amazon Web Services, Inc. or its Affiliates. ObservePresent FeedbackReact
  9. 9. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. Control Theory
  10. 10. © 2019, Amazon Web Services, Inc. or its Affiliates. Prior Art
  11. 11. © 2019, Amazon Web Services, Inc. or its Affiliates. Prior Art
  12. 12. © 2019, Amazon Web Services, Inc. or its Affiliates. Control Theory and PID loops Comes up in the context of … Autoscaling and placement: Instances, Storage, Network, etc. Fairness algorithms: TCP, Queues, Throttling Systems stability
  13. 13. © 2019, Amazon Web Services, Inc. or its Affiliates. The Furnace
  14. 14. © 2019, Amazon Web Services, Inc. or its Affiliates. The Furnace
  15. 15. © 2019, Amazon Web Services, Inc. or its Affiliates. The Furnace Measure React
  16. 16. © 2019, Amazon Web Services, Inc. or its Affiliates. The Furnace error Time
  17. 17. © 2019, Amazon Web Services, Inc. or its Affiliates. The Furnace error Time
  18. 18. © 2019, Amazon Web Services, Inc. or its Affiliates. The Furnace error Time
  19. 19. © 2019, Amazon Web Services, Inc. or its Affiliates. Autoscaling error Time
  20. 20. © 2019, Amazon Web Services, Inc. or its Affiliates. Autoscaling Measure React
  21. 21. © 2019, Amazon Web Services, Inc. or its Affiliates. Autoscaling : forecasting and fancy integrals! Any signal can be processed with Fourier Analysis to find underlying constituent frequencies Real-world operational systems often have strong daily, weekly, annual cycles, etc. Holt-Winters Forecasting can simulate these cycles into the future Machine Learning can do even better!
  22. 22. © 2019, Amazon Web Services, Inc. or its Affiliates. Autoscaling : forecasting and fancy integrals!
  23. 23. © 2019, Amazon Web Services, Inc. or its Affiliates. Placement and fairness
  24. 24. © 2019, Amazon Web Services, Inc. or its Affiliates.
  25. 25. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Open Loops 1
  26. 26. © 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Open Loops # launch 10 Instances for (i = 0; i < 10; i++) instance[i] = ec2_launch_instance() # wait a minute sleep(60); # Register the instances for (i = 0; i < 10; i++) register_instance(instance[i]);
  27. 27. © 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Open Loops • A surprising number of real-world systems are Open Loops • Potential reasons: • Organic out-growth from scripts • Imperative programming “Do this, then do this” is very natural • Infrastructure is very very reliable these days • Infrequent actions
  28. 28. © 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Open Loops
  29. 29. © 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Open Loops
  30. 30. © 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Open Loops • Closing loops: • Embrace “Measure first. Then react.” • Measure a lot of things. Check everything you can think to. • Avoid infrequent operations – make them more frequent where possible.
  31. 31. © 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Open Loops
  32. 32. © 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Open Loops • Measure-first systems tend to be more naturally declarative • In general, declarative are easily to formally verify • TLA+ , F*, CoQ, SAW/Cryptol
  33. 33. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Power Laws 2
  34. 34. © 2019, Amazon Web Services, Inc. or its Affiliates. Power Laws
  35. 35. © 2019, Amazon Web Services, Inc. or its Affiliates. Power Laws
  36. 36. © 2019, Amazon Web Services, Inc. or its Affiliates. Power Laws
  37. 37. © 2019, Amazon Web Services, Inc. or its Affiliates. Power Laws
  38. 38. © 2019, Amazon Web Services, Inc. or its Affiliates. Power Laws
  39. 39. © 2019, Amazon Web Services, Inc. or its Affiliates. Power Laws • First, compartmentalize. • More compartments means relatively smaller blast radius. • Many real-world control systems reflect this lesson of scale. • What next?
  40. 40. © 2019, Amazon Web Services, Inc. or its Affiliates. Power Laws • Exponential Back-off • Brings our own power-law to the table • Rate-limiters • Simple token buckets can be incredibly effective • Working Backpressure • AWS SDK retry strategy = Token buckets + Rate-Limiters + persistent state
  41. 41. © 2019, Amazon Web Services, Inc. or its Affiliates. Power Laws
  42. 42. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Liveness and Lag 3
  43. 43. © 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Liveness and Lag • Operating on old information can be worse than operating on no information • Simple example: system gets very busy and workflows and metrics pipelines can build up • Ephemeral “shocks” such as spiky loads or brief outages can end up taking very long to recover
  44. 44. © 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Liveness and Lag • Strive for O(1) scaling as much as possible • Provision everything, every time • Report everything, every time • Do everything, every time
  45. 45. © 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Liveness and Lag • If you need to use a bus or queue, think carefully about limits on the size of that queue • In general: short queues are safer • LIFO queues can be a great strategy for information channels • Naturally prioritizes recent state • Out of order back-fill for any “catching up”
  46. 46. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: False Functions 4
  47. 47. © 2019, Amazon Web Services, Inc. or its Affiliates. False functions load utilization
  48. 48. © 2019, Amazon Web Services, Inc. or its Affiliates. False functions load utilization
  49. 49. © 2019, Amazon Web Services, Inc. or its Affiliates. False functions • Hall of fame false function: Unix load • Runners-up: system latency, network latency • Hard to predict Garbage Collector behavior can be confounding • CPU can be surprisingly effective
  50. 50. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Edge Triggering 5
  51. 51. © 2019, Amazon Web Services, Inc. or its Affiliates. Edge Triggering load utilization
  52. 52. © 2019, Amazon Web Services, Inc. or its Affiliates. Edge Triggering • Edge Triggering invites modal behavior • Often the new mode kicks in at a time of high-stress • Edge Triggering often associated with the “Deliver exactly once” problem • O.k. for alerting humans but usually an anti-pattern for control systems
  53. 53. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. Summary
  54. 54. © 2019, Amazon Web Services, Inc. or its Affiliates. Summary • “Measure first” and “Integrate feedback” are deeply rewarding concepts • Right now, this knowledge is highly leveraged • We can think of distributed systems in terms of control theory, with 100 years of powerful mental models available • Control Theory can help us formally analyze the stability of systems
  55. 55. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. Q&A Colm MacCárthaigh
  56. 56. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. Thank you!
  57. 57. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ pid-loops/

×