Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Exactly-once Semantics in Apache Kafka

4 028 vues

Publié le

Apache Kafka's rise in popularity as a streaming platform has demanded a revisit of its traditional at-least-once message delivery semantics.
In this talk, we present the recent additions to Kafka to achieve exactly-once semantics (EoS) including support for idempotence and transactions in the Kafka clients. The main focus will be the specific semantics that Kafka distributed transactions enable and the underlying mechanics which allow them to scale efficiently.

Publié dans : Logiciels

Exactly-once Semantics in Apache Kafka

  1. 1. 1 Introducing Exactly Once Semantics in Apache Kafka™ Apurva Mehta, Software Engineer, Gehrig Kunz, Technical Product Marketing Manager
  2. 2. 2 Agenda • Why exactly-once? • An overview of messaging semantics • Why are duplicates introduced? • What is exactly-once semantics? • Exactly-once semantics in Kafka: Is it Practical? • Next Steps
  3. 3. 3 Exactly Once Semantics is a hard problem
  4. 4. 4 An overview of messaging semantics • At-most once • At-least once • Exactly-once
  5. 5. 5 Why exactly-once? • Stream processing is becoming the norm; it’s more natural. • Apache Kafka is the most popular streaming platform. • Mission critical applications require stronger guarantees.
  6. 6. 6 Why exactly-once? • Stream processing is becoming the norm; it’s more natural. • Apache Kafka is the most popular streaming platform. • Mission critical applications require stronger guarantees. In other words: make stream processing easy, simple, and reliable enough for everyone.
  7. 7. 7 Apache Kafka’s existing semantics At Least Once
  8. 8. 8 Kafka’s Existing Semantics
  9. 9. 9 Kafka’s Existing Semantics
  10. 10. 10 Kafka’s Existing Semantics
  11. 11. 11 Kafka’s Existing Semantics
  12. 12. 12 Kafka’s Existing Semantics
  13. 13. 13 Kafka’s Existing Semantics
  14. 14. 14 What do we do now??? Kafka’s Existing Semantics
  15. 15. 15 Kafka’s Existing Semantics: At Least Once
  16. 16. 16 Kafka’s Existing Semantics: At Least Once
  17. 17. 17 Kafka’s Existing Semantics: At Least Once
  18. 18. 18 Why are duplicates introduced? Various failures must be handled correctly: • Broker can fail • Producer-to-Broker RPC can fail • Producer or Consumer client can fail
  19. 19. 19 TL;DR – What we have today • At least once in order delivery per partition. • Producer retries can introduce duplicates and headaches.
  20. 20. 20 The age old engineering question Before we make this work, are we sure we should?
  21. 21. 21 KafkaCash: A Peer to Peer Lending App A peer-to-peer lending platform.
  22. 22. 22 Help Bob reach $1000, send him $10
  23. 23. 23 KafkaCash, powered by Kafka
  24. 24. 24 Offset commits
  25. 25. 25 Reprocessed transfer, eek!
  26. 26. 26 Lost money! Eek eek!
  27. 27. 27 How did Kafka add exactly once semantics?
  28. 28. 28 Exactly-once semantics in Kafka, explained Apache Kafka’s guarantees are stronger in 3 ways: • Idempotent producer: Exactly-once, in-order, delivery per partition. • Transactions: Atomic writes across partitions. • Exactly-once stream processing across read-process- write tasks.
  29. 29. 29 Part 1/3 : Idempotent Producer Exactly-once, in-order, delivery per partition
  30. 30. 30 Idempotent Producer Semantics A single --successful!-- producer.send will result in exactly one copy of the message in the log in all circumstances.
  31. 31. 31 Producer Configs • enable.idempotence = true • max.inflight.requests.per.connection=1 • acks = “all” • retries > 0 (preferably MAX_INT)
  32. 32. 32 The idempotent producer
  33. 33. 33 The idempotent producer
  34. 34. 34 The idempotent producer
  35. 35. 35 The idempotent producer
  36. 36. 36 The idempotent producer
  37. 37. 37 The idempotent producer
  38. 38. 38 The idempotent producer
  39. 39. 39 The idempotent producer
  40. 40. 40 TL;DR: idempotent producer • Works transparently -- only one config change. • Sequence numbers and producer ids are in the log. • Resilient to broker failures, producer retries, etc.
  41. 41. 41 Part 2/3 : Transactions Atomic writes across multiple partitions.
  42. 42. 42 Transactions semantics • Atomic writes across multiple partitions. • All messages in a transaction are made visible together, or none are. • Consumers must be configured to skip uncommitted messages.
  43. 43. 43 Producer config for transactions • transactional.id = ‘some string’ • Typically based on the partition identifier in a partitioned, stateful, app. • Enables transaction recovery across producer sessions.
  44. 44. 44 The transaction API producer.initTransactions(); try { producer.beginTransaction(); producer.send(record0); producer.send(record1); producer.commitTransaction(); } catch (KafkaException e) { producer.abortTransaction(); }
  45. 45. 45 Transactions
  46. 46. 46 1. Initialize the producer producer.initTransactions(); try { producer.beginTransaction(); producer.send(record0); producer.send(record1); producer.commitTransaction(); } catch (KafkaException e) { producer.abortTransaction(); }
  47. 47. 47 Initializing ‘transactions’
  48. 48. 48 2. Begin transactions and send data producer.initTransactions(); try { producer.beginTransaction(); producer.send(record0); producer.send(record1); producer.commitTransaction(); } catch (KafkaException e) { producer.abortTransaction(); }
  49. 49. 49 Transactional sends
  50. 50. 50 Transactional sends
  51. 51. 51 3. Commit transaction producer.initTransactions(); try { producer.beginTransaction(); producer.send(record0); producer.send(record1); producer.commitTransaction(); } catch (KafkaException e) { producer.abortTransaction(); }
  52. 52. 52 Commit
  53. 53. 53 Commit
  54. 54. 54 Commit
  55. 55. 55 Success!
  56. 56. 56 Consumer configs • isolation.level: • “read_committed”, or • “read_uncommitted”
  57. 57. 57 What do you get with isolation levels? • read_committed: consumers read to the point where there are no open transactions. • read_uncommitted: will read everything. • Messages read in offset order.
  58. 58. 58 TL;DR: Transactions • Atomic, multi-partition, writes. • Use the new producer APIs for transactions. • Consumers can filter out uncommitted or aborted transactional messages.
  59. 59. 59 Part 3/3 : Stream Processing Stream Processing with Exactly Once Semantics
  60. 60. 60 Streams config • processing.mode = “exactly_once”
  61. 61. 61 End-to-end exactly-once semantics • The read-process-write operation is atomic. • Thus streams tasks produce valid answers even when failures happen.
  62. 62. 62 Back to KafkaCash
  63. 63. 63 Exactly Once Semantics in Kafka Is it practical?
  64. 64. 64 Performance boost for Apache Kafka 0.11! • Up to +20% producer throughput • Up to +50% consumer throughput • Up to -20% disk utilization • Details: https://bit.ly/kafka-eos-perf
  65. 65. 65 Gains due to more efficient message format
  66. 66. 66 What about the idempotent producer and transactions? • Transactions: 3-5% overhead for 100ms transactions, 1KB messages. • Longer transactions and better batching result in better performance. • 20% overhead relative to at-most once delivery without ordering guarantees. • Idempotent producer alone has negligible overhead.
  67. 67. 67 Putting it together • We talked through an idempotent producer • How we added transactions with atomic writes • The impact it has on stream processing
  68. 68. 68 When is it available? Available to use in Kafka 0.11, June 2017.
  69. 69. 69 Where we’ve come 2007 High throughput messaging broker 2008 Highly available replicated log 2012 Top Level Apache Project 2016 Streams API Connect API 2017 Exactly Once Semantics
  70. 70. 70 San Francisco August 28, 2017 Organized by Confluent
  71. 71. 71 What’s next for you slackpass.io/ confluentcommunity v Try it v v Join the Community Let us know what you think @ConfluentDownload Confluent Open Source
  72. 72. 72 Thank You!

×