Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Adaptive Availability for Quality of Service

248 vues

Publié le

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2baJEz8.

Theo Schlossnagle talks about lessons learned in building an always-on distributed time-series database with aggressive quality of service guarantees. He also talks about both macro and micro techniques used to cope with bad machines, bad actors and other poorly qualified badness. Most are adaptive techniques backed with both local and cluster-wide statistical analysis of observed behavior. Filmed at qconnewyork.com.

Theo Schlossnagle founded Circonus in 2010, and continues to be its principal architect. He is the author of Scalable Internet Architectures (Sams) and a frequent speaker at worldwide IT conferences. He is a member of the IEEE and a senior member of the ACM. He serves on the editorial board of the ACM's Queue Magazine.

Publié dans : Technologie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Adaptive Availability for Quality of Service

  1. 1. Sometimes (most times) down and out is better than slow. Adaptive Availability for Quality of Service
  2. 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ time-series-database
  3. 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  4. 4. A new world order Slow ≅ Byzantine In most modern systems, users perceive:
 
 “slow is the new down.” In most distributed systems:
 
 “slow is indistinguishable from byzantine operations.”
  5. 5. We had to be “very sure” in the Days of Failover Primary : Replica system usually have non-zero operational costs in performance failover. • dataloss (in asynchronous systems) • operational downtime • operational rebuild time (reversing the flows)
  6. 6. For well-designed, available systems, Constraints Have Changed Deciding to fail a node is no longer a “last resort” decision.
  7. 7. What do I mean by well-designed? The failure of a node does not cause • service interruption • significant performance regressions The recovery of a node does not cause • unnecessary work (only minimal replay) • significant performance regressions
  8. 8. A brief tangent on an Anecdotal Design Active feedback on replay performance
  9. 9. Snowth design ❖ Need: zero-downtime ❖ Know: Agreement is hard. ❖ Know: Consensus is expensive. ❖ CAP theorem tradeoffs suck. ❖ CRDT (Commutative Replicated Data Type) n1 n2 n3 n4 n5 n6
  10. 10. n1-1 n1-2 n1-3 n1-4 n2-1 n2-2 n2-3 n2-4 n3-1 n3-2 n3-3 n3-4 n4-1 n4-2 n4-3 n4-4 n5-1 n5-2 n5-3 n5-4 n6-1 n6-2 n6-3 n6-4
  11. 11. n1-1 n1-2 n1-3 n1-4 n2-1 n2-2 n2-3 n2-4 n3-1 n3-2 n3-3 n3-4 n4-1 n4-2 n4-3 n4-4 n5-1 n5-2 n5-3 n5-4 n6-1 n6-2 n6-3 n6-4 o1
  12. 12. n1-1 n1-2 n1-3 n1-4 n2-1 n2-2 n2-3 n2-4 n3-1 n3-2 n3-3 n3-4 n4-1 n4-2 n4-3 n4-4 n5-1 n5-2 n5-3 n5-4 n6-1 n6-2 n6-3 n6-4 o1
  13. 13. n1-1 n1-2 n1-3 n1-4 n2-1 n2-2 n2-3 n2-4 n3-1 n3-2 n3-3 n3-4 n4-1 n4-2 n4-3 n4-4 n5-1 n5-2 n5-3 n5-4 n6-1 n6-2 n6-3 n6-4
  14. 14. n1-1 n1-2 n1-3 n1-4 n2-1 n2-2 n2-3 n2-4 n3-1 n3-2 n3-3 n3-4 n4-1 n4-2 n4-3 n4-4 n5-1 n5-2 n5-3 n5-4 n6-1 n6-2 n6-3 n6-4 Availability
 Zone 1 Availability
 Zone 2
  15. 15. o1 n1-1 n1-2 n1-3 n1-4 n2-1 n2-2 n2-3 n2-4 n3-1 n3-2 n3-3 n3-4 n4-1 n4-2 n4-3 n4-4 n5-1 n5-2 n5-3 n5-4 n6-1 n6-2 n6-3 n6-4 Availability
 Zone 1 Availability
 Zone 2
  16. 16. Availability
 Zone 1 Availability
 Zone 2 o1 n1-1 n1-2 n1-3 n1-4 n2-1 n2-2 n2-3 n2-4 n3-1 n3-2 n3-3 n3-4 n4-1 n4-2 n4-3 n4-4 n5-1 n5-2 n5-3 n5-4 n6-1 n6-2 n6-3 n6-4
  17. 17. A look at adaptive algorithms in Replication How do you choose the right unit of work for tasks?
  18. 18. What does it sound like when a system Backfires Batch it faster than single ops • less latency impact • less transactional overhead What with QoS enforcement & circuit breakers? Flogging TCP (and everything else) can teach us something.
  19. 19. This provides us Opportunities What if we had relative homogeny of
 systems and workloads?
  20. 20. Some problems get easier Simplified Outlier Detection If there is an implicit assumption that machines behave similarly,
 
 then it becomes much easier to determine when they fail to do so.
  21. 21. New things become possible Predicting Future Conditions With higher volume data,
 
 statistical models offer higher confidence.
  22. 22. We have better tools now that high-volume data isn’t intimidating: Better insight That hairline contains >9MM samples. Histogram shown. 4 modes… WTF?
  23. 23. It takes good understanding of statistics to ask the right questions. Misleading yourself This is a q(0.99) — 99th percentile. It obviously goes off the rails around 1am. No.
  24. 24. It takes good understanding of statistics to ask the right questions. Measuring what matters Instead of measuring
 “how slow transaction are”
 
 we measure
 “how many transactions are too slow” Condition
  25. 25. We have a new tool in the tool chest: Intentionally Failing Nodes When nodes are cattle, not pets…
  26. 26. Expect more from you systems. Thank You You can observe better, know more, don’t settle.
  27. 27. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/time- series-database

×