Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

TDEA 2018 Kafka EOS (Exactly-once)

746 vues

Publié le

台灣資料工程協會2018年會- Kafka EOS (Exactly-once)的Slides。這個talk主要以圖示化的方法來說明Kafka如何實現Idempotant的Producer的概念以及底層溝通的流程。

Publié dans : Technologie
  • Soyez le premier à commenter

TDEA 2018 Kafka EOS (Exactly-once)

  1. 1. Kafka (Exactly-once)” 1
  2. 2. 2
  3. 3. e • ) notm l tm • 2.6 6/-u y • .54 6 4 22 . 0 P 1 B • . A . EC CC 0 B A FC • 3.2 6 C 1 B • ( h SO u • ) a d rs • M • R h( u gi • M / C B L LTI erhwenkuo@gmail.com 3
  4. 4. Agenda • Why exactly-once? • An overview of messaging semantics • Why are duplicates introduced? • What is exactly-once semantics? • Exactly-once semantics in Kafka 4
  5. 5. Kafka Exactly-once 5
  6. 6. An overview of messaging semantics Kafka message delivery semantics • At most once: offsets are committed as soon as the message is received. If the processing goes wrong, the message will be lost (it won’t be read again). • At least once: offsets are committed after the message is processed. If the processing goes wrong, the message will be read again. This can result in duplicate processing of messages. Make sure your processing is idempotent (i.e. processing again the message won’t impact your systems) • Exactly once: Very difficult to achieve / need strong engineering. (Kafka start to provide “exactly once” from v.0.11 6
  7. 7. • Stream processing is becoming the norm; it’s more natural. • Apache Kafka is the most popular streaming platform. • Mission critical applications require stronger guarantees. Why exactly-once? 7
  8. 8. Apache Kafka’s existing semantics At Least Once 8
  9. 9. Kafka’s Existing Semantics At-least-once Producer Partition (leader) Topic: xxx Kafka Brokers The log 9 Producer configurations
  10. 10. Kafka’s Existing Semantics At-least-once Key Value x yx y Producer Partition (leader) Topic: xxx Kafka Brokers The log Send(x, y) 10 Producer configurations
  11. 11. Kafka’s Existing Semantics At-least-once Producer Partition (leader) Topic: xxx Kafka Brokers The log append(x, y) Key Value x yx y K V x yx y 11 Producer configurations
  12. 12. Kafka’s Existing Semantics At-least-once Producer Partition (leader) Topic: xxx Kafka Brokers The log ack K V x yx y 12 Producer configurations
  13. 13. Kafka’s Existing Semantics At-least-once Producer Partition (leader) Topic: xxx Kafka Brokers The log K V x yx y Key Value x ya b Send(a, b) 13 Producer configurations
  14. 14. Kafka’s Existing Semantics At-least-once Producer Partition (leader) Topic: xxx Kafka Brokers The log append(x, y) K V x yx y Key Value x ya b K V x ya b 14 Producer configurations
  15. 15. Kafka’s Existing Semantics At-least-once Producer Partition (leader) Topic: xxx Kafka Brokers The log K V x yx yack K V x ya b , 15 Producer configurations
  16. 16. Kafka’s Existing Semantics At-least-once Producer Partition (leader) Topic: xxx Kafka Brokers The log K V x yx yack K V x ya b 16 Producer configurations
  17. 17. Kafka’s Existing Semantics At-least-once Producer Partition (leader) Topic: xxx Kafka Brokers The log K V x yx y K V x ya b Key Value x ya b Send(a, b) , 17 Producer configurations
  18. 18. Kafka’s Existing Semantics At-least-once Producer Partition (leader) Topic: xxx Kafka Brokers The log append(x, y) K V x yx y Key Value x ya b K V x ya b K V x ya b 18 Producer configurations
  19. 19. Kafka’s Existing Semantics At-least-once Producer Partition (leader) Topic: xxx Kafka Brokers The log ack K V x yx y K V x ya b K V x ya b B At-least-once !, , 19 Producer configurations
  20. 20. Various failures must be handled correctly • Broker can fail • Producer-to-Broker RPC can fail • Network between Producer & Broker can fail • Producer client can fail • Producer client can become zombie Why are duplicates introduced? 20
  21. 21. Semantic Weaknesses At-least-once • Producer retries are not safe • Processed data is not written atomically with corresponding offsets • No protection from evil zombies 21 Producer
  22. 22. How did Kafka add exactly once semantics? version >= 0.11 22
  23. 23. Exactly-once semantics in Kafka, explained Apache Kafka’s guarantees are stronger in 3 ways: • Idempotent producer • Exactly-once, in-order, delivery per partition. • Transactions • Atomic writes across multiple topics/partitions. • Exactly-once stream processing - (Kafka Stream & KSQL) • across read-process-write tasks 23
  24. 24. Exactly-once, in-order, delivery per partition Idempotent Producer 24
  25. 25. Idempotent Producer Semantics • Idempotent is the second name to exactly once. To stop processing a message multiple times, message must be persisted to Kafka topic only once. • A single successful producer.send( ) will result in exactly one copy of the message in the log in all circumstances • Idempotent delivery ensures that messages are delivered exactly once to a particular topic partition during the lifetime of a single producer. 25
  26. 26. How idempotent producer works? Key Design Principle Idempotent producer • Exactly-once, in-order, delivery per partition. • Avoid data duplication • Works transparently -- only one config change. • Resilient to broker failures, producer retries, etc. 26
  27. 27. How idempotent producer works? Message Binary Format Change Idempotent producer • Change Log Message Binary Format • Add “ProducerId” • Add “Sequence” number offset Message Format key value timestamp headers producerid sequence 27
  28. 28. The idempotent producer pid = 100pid = 100 seq = 0 Producer Partition (leader) Topic: xxx Kafka Brokers , The log 28 Producer configurations
  29. 29. The idempotent producer pid = 100pid = 100 seq = 0 Producer Partition (leader) Topic: xxx Kafka Brokers The log Send(x, y) key value x yx y pid seq x y100 0 29
  30. 30. The idempotent producer pid = 100 seq = 0 Producer Partition (leader) Topic: xxx Kafka Brokers The log key value x yx y pid seq x y100 0 pid = 100 append(x, y) key value x yx y pid seq x y100 0 30
  31. 31. The idempotent producer pid = 100 seq = 0 Producer Partition (leader) Topic: xxx Kafka Brokers The log pid = 100 seq = 0 key value x yx y pid seq x y100 0ack 31
  32. 32. The idempotent producer pid = 100 seq = 1 Producer Partition (leader) Topic: xxx Kafka Brokers The log pid = 100 seq = 0 key value x yx y pid seq x y100 0Send(a, b) key value x ya b pid seq x y100 1 32 pid = 100 seq = 0
  33. 33. The idempotent producer pid = 100 seq = 1 Producer Partition (leader) Topic: xxx Kafka Brokers The log pid = 100 seq = 1 key value x yx y pid seq x y100 0 key value x ya b pid seq x y100 1 key value x ya b pid seq x y100 1 append(a, b) 33
  34. 34. The idempotent producer pid = 100 seq = 1 Producer Partition (leader) Topic: xxx Kafka Brokers The log pid = 100 seq = 1 key value x yx y pid seq x y100 0 key value x ya b pid seq x y100 1 ack , 34
  35. 35. The idempotent producer pid = 100 seq = 1 Producer Partition (leader) Topic: xxx Kafka Brokers The log pid = 100 seq = 1 key value x yx y pid seq x y100 0Send(a, b) key value x ya b pid seq x y100 1 key value x ya b pid seq x y100 1 , 35
  36. 36. The idempotent producer Broker found duplicate (pid + seq)! pid = 100 seq = 1 Producer Partition (leader) Topic: xxx Kafka Brokers The log pid = 100 seq = 1 key value x yx y pid seq x y100 0 ack - duplicate key value x ya b pid seq x y100 1 + , - B , + 36
  37. 37. Producer Configs • idempotent=true • retries=infinite • acks = all • max.inflight=1 ?? -() 1 ) 1 1 1 () ! 1( ) - .- , 37
  38. 38. Producer Configs https://issues.apache.org/jira/browse/KAFKA-5494 38
  39. 39. Producer Configs (Revised) • idempotent=true • retries=infinite • acks = all • max.inflight=3 (or whatever) , E ) 0 1) . -. ) K 39
  40. 40. 40

×