Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Devoxx university - Kafka de haut en bas

900 vues

Publié le

Du kafka, du kafka streams, du ksql, du kafka connect

Publié dans : Logiciels
  • Soyez le premier à commenter

Devoxx university - Kafka de haut en bas

  1. 1. #DevoxxFR Kafka … de haut en bas ! University Florent Ramière @framiere Jean-Louis Boudart @jlboudart Nicolas Romanetti @nromanetti 1
  2. 2. 2 Massive volumes of new data generated every day Mobile Cloud Microservices Internet of Things Machine Learning Distributed across apps, devices, datacenters, clouds Structured, unstructured polymorphic What
  3. 3. 3 Problem ?
  4. 4. 4 Silos explained by Data Gravity concept As data accumulates (builds mass) there is a greater likelihood that additional services and applications will be attracted to this data. This is the same effect gravity has on objects around a planet. As the mass or density increases, so does the strength of gravitational pull.
  5. 5. 5 With
  6. 6. 6 How
  7. 7. 7 Store & ETL Process Publish & Subscribe In short
  8. 8. 8 From a simple idea
  9. 9. 9 From a simple idea
  10. 10. 10 with great properties ! • Scalability • Retention • Durability • Replication • Security • Resiliency • Throughput • Ordering • Exactly Once Semantic • Transaction • Idempotency • Immutability • …
  11. 11. 11 11 Producer
  12. 12. 12 Anatomy of a Message
  13. 13. 13
  14. 14. 14 Producing to Kafka - No Key Time Messages will be produced in a round robin fashion
  15. 15. 15 Producing to Kafka - With Key Time A B C D hash(key) % numPartitions = N
  16. 16. 16 Partition Leadership and Replication Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
  17. 17. 17 Partition Leadership and Replication - node failure Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
  18. 18. 18 Producer Guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties acks=0
  19. 19. 19 Producer Guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 ack Producer Properties acks=1
  20. 20. 20 Producer Guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties acks=all min.insync.replica=2 First copy returns ack ack
  21. 21. 21 21 Consumer
  22. 22. 22 Consuming From Kafka - Single Consumer C
  23. 23. 23 Consuming From Kafka - Grouped Consumers CC C1 CC C2
  24. 24. 24 Consuming From Kafka - Grouped Consumers C C C C
  25. 25. 25 Consuming From Kafka - Grouped Consumers 0 1 2 3
  26. 26. 26 Consuming From Kafka - Grouped Consumers 0 1 2 3
  27. 27. 27 Consuming From Kafka - Grouped Consumers 0, 3 1 2 3
  28. 28. 28 Compacted Topics – Keep only the most recent value for a key
  29. 29. 29 29 Destroy all the magic!
  30. 30. 30 Open protocol https://kafka.apache.org/protocol
  31. 31. 31 31 Broker Lifecycle
  32. 32. 32 Anatomy of a Producer Request on a Broker
  33. 33. 33 Anatomy of a Fetch Request on a Broker
  34. 34. 34 34 Not so fast !
  35. 35. 35 Set up secure Kafka & build your first app Understand streaming Monitor & manage a mission-critical solution Set up secure Kafka & build your first app Understand streaming Infrastructure & apps across LOBs Monitor & manage a mission-critical solution Set up secure Kafka & build your first app Understand streaming Self-service on shared Kafka Infrastructure & applications across LOBs Monitor & manage a mission-critical solution Set up secure Kafka & build your first app Understand streamingUnderstand streaming Pre-streamingValue Stream Everything 05Break Silos 04 03 Go To Production 02 Learn Kafka 01 Investment & Time Solve A Critical Need Maturity model
  36. 36. 36 Set up secure Kafka & build your first app Understand streaming Monitor & manage a mission-critical solution Set up secure Kafka & build your first app Understand streaming Infrastructure & apps across LOBs Monitor & manage a mission-critical solution Set up secure Kafka & build your first app Understand streaming Self-service on shared Kafka Infrastructure & applications across LOBs Monitor & manage a mission-critical solution Set up secure Kafka & build your first app Understand streamingUnderstand streaming Pre-streamingValue Stream Everything 05Break Silos 04 03 Go To Production 02 Learn Kafka 01 Solve A Critical Need Maturity model
  37. 37. 37 Set up secure Kafka & build your first app Understand streaming Monitor & manage a mission-critical solution Set up secure Kafka & build your first app Understand streaming Infrastructure & apps across LOBs Monitor & manage a mission-critical solution Set up secure Kafka & build your first app Understand streaming Self-service on shared Kafka Infrastructure & applications across LOBs Monitor & manage a mission-critical solution Set up secure Kafka & build your first app Understand streamingUnderstand streaming Pre-streamingValue Stream Everything 05Break Silos 04 03 Go To Production 02 Learn Kafka 01 Solve A Critical Need Maturity model
  38. 38. 38 Business Value!
  39. 39. 39 39 This is a full platform
  40. 40. 40 … spawned a full platform Apache Kafka® Core | Connect API | Streams API Stream Processing & Compatibility KSQL | Schema Registry Operations Replicator | Auto Data Balancer | Connectors | MQTT Proxy | Operator Database Changes Log Events IoT Data Web Events other events Hadoop Database Data Warehouse CRM other DATA INTEGRATION Transformations Custom Apps Analytics Monitoring other REAL-TIME APPLICATIONS OPEN SOURCE FEATURES COMMERCIAL FEATURES Datacenter Public Cloud Confluent Cloud CONFLUENT PLATFORM Administration & Monitoring Control Center | Security Connectivity Clients | Connectors | REST Proxy CONFLUENT FULLY-MANAGEDCUSTOMER SELF-MANAGED
  41. 41. 41 41 ETL
  42. 42. 42 I
  43. 43. 43 43 Start small
  44. 44. 44
  45. 45. 45
  46. 46. 46
  47. 47. 47
  48. 48. 48
  49. 49. 49
  50. 50. 50 50 More “Real life” databases
  51. 51. 51
  52. 52. 52 T1
  53. 53. 53 T1,T2,T3
  54. 54. 54 T1,T2,T3 … T214 ?
  55. 55. 55 T1,T2,T3 … T214 T1-T70 T71,T139 T140,T214
  56. 56. 56 T1,T2,T3 … T214 T1-T70 T140,T214 T71,T139
  57. 57. 57 T1,T2,T3 … T214 T1-T70 T140,T214 T71,T104 T105,139
  58. 58. 58 T1,T2,T3 … T214 T1-T70 T71,T139 T140,T214
  59. 59. 59 T1,T2,T3 … T223 T1-T70 T71,T139 T140,T214 ?
  60. 60. 60 T1,T2,T3 … T223 T1-T70 T71,T139 T140,T214 ?
  61. 61. 61 Apache Kafka Connect API: Import and Export Data In & Out of Kafka JDBC Mongo MySQL Elastic Cassandra HDFS Kafka Connect API Kafka Pipeline Connector Connector Connector Connector Connector Connector Sources Sinks Fault tolerant Manage hundreds of data sources and sinks Preserves data schema Integrated within Confluent Control Center
  62. 62. 62 Connectors: Connect Kafka Easily with Data Sources and Sinks Databases Datastore/File Store Analytics Applications / Other
  63. 63. 63 Kafka Connect API, Part of the Apache Kafka™ Project Connect any source to any target system Integrated • 100% compatible with Kafka v0.9 and higher • Integrated with Confluent’s Schema Registry • Easy to manage with Confluent Control Center Flexible • 40+ open source connectors available • Easy to develop additional connectors • Flexible support for data types and formats Compatible • Maintains critical metadata • Preserves schema information • Supports schema evolution Reliable • Automated failover • Exactly-once guarantees • Balances workload between nodes
  64. 64. 64 Confluent Hub - The Kafka App Store
  65. 65. 65 65 Connectivity
  66. 66. 66 Clients: Communicate with Kafka in a Broad Variety of Languages Apache Kafka Confluent Platform Community Supported Proxy http/REST stdin/stdout Confluent Platform Clients developed and fully supported by Confluent
  67. 67. 67 REST Proxy: Talking to Non-native Kafka Apps and Outside the Firewall REST Proxy Non-Java Applications Native Kafka Java Applications Schema Registry REST / HTTP Simplifies administrative actions Simplifies message creation and consumption Provides a RESTful interface to a Kafka cluster
  68. 68. 68 68 Processing
  69. 69. 69 Stream Processing by Analogy Kafka Cluster Connect API Stream Processing Connect API $ cat < in.txt | grep "ksql" | tr a-z A-Z > out.txt
  70. 70. 70 • subscribe() • poll() • send() • flush() Consumer, Producer Flexibility Simplicity Trade offs
  71. 71. 71 Low Level API Consumer Producer
  72. 72. 72 • subscribe() • poll() • send() • flush() Consumer, Producer • mapValues() • filter() • punctuate() Kafka Streams Flexibility Simplicity Trade offs
  73. 73. 73 High level API
  74. 74. App Streams API Not running inside brokers!
  75. 75. Consumer Group Protocol Power! App Streams API App Streams API App Streams API Same app, many instances
  76. 76. 76 Before DashboardProcessing Cluster Your Job Shared Database
  77. 77. 77 After Dashboard APP Streams API
  78. 78. 78 Things Kafka Streams Does Runs everywhere Clustering done for you Exactly-once processing Event-time processing Integrated database Joins, windowing, aggregation S/M/L/XL/XXL/XXXL sizes
  79. 79. 79 • subscribe() • poll() • send() • flush() Consumer, Producer • mapValues() • filter() • punctuate() Kafka Streams Flexibility Simplicity Trade offs
  80. 80. 80 80 Kafka Streams Time time time
  81. 81. 81 Time! Time! Time! Time! Time! Time! Time! Time!
  82. 82. 82 Windowing in Kafka Streams
  83. 83. 83 Tumbling time windows 83
  84. 84. 84 Hopping time windows 84
  85. 85. 85 Session windows 85
  86. 86. 86 Event Time Processing Event-time ”The point in time when an event or data record occurred, i.e. was originally created "by the source". Achieving event-time semantics typically requires embedding timestamps in the data records at the time a data record is being produced.” Processing-time ”The point in time when the event or data record happens to be processed by the stream processing application, i.e. when the record is being consumed. The processing-time may be milliseconds, hours, or days etc. later than the original event- time.” Ingestion-time “The point in time when an event or data record is stored in a topic partition by a Kafka broker.”
  87. 87. 87 87 Kafka Streams Exactly once semantic
  88. 88. 88 Delivery Guarantee At most once “Messages may be lost but are never redelivered.” At least once “Messages are never lost but may be redelivered.“ Exactly once “Each message is delivered once and only once.“
  89. 89. 89 Exactly Once principle
  90. 90. 90 Failure Scenario : Duplicate Writes
  91. 91. 91 Failure Scenario : Duplicate Processing
  92. 92. 92 Producer Guarantees - without exactly once guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties acks=all min.insync.replica=2 {key: 1234 data: abcd} - offset 3345 Failed ack Successful write
  93. 93. 93 Producer Guarantees - without exactly once guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties acks=all min.insync.replica=2 {key: 1234, data: abcd} - offset 3345 {key: 1234, data: abcd} - offset 3346 retry ack dupe!
  94. 94. 94 Producer Guarantees - with exactly once guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties enable.idempotence=true max.inflight.requests.per.connection=1 acks = “all” retries > 0 (preferably MAX_INT) (pid, seq) [payload] (100, 1) {key: 1234, data: abcd} - offset 3345 (100, 1) {key: 1234, data: abcd} - rejected, ack re-sent (100, 2) {key: 5678, data: efgh} - offset 3346 retry ack no dupe!
  95. 95. 95 Exactly once Idempotent Producer Transactions Isolation Level • Read committed • Read uncommitted 95
  96. 96. 96 Transactions !
  97. 97. 97 Exactly once made simple with Kafka Streams
  98. 98. 98 98 Kafka Streams Interactive Queries
  99. 99. 99 Interactive Queries App Streams API kTable = aStream .groupByKey() .reduce(reducer,materialize) From our App, how to query the state store? State Store Kafka Cluster
  100. 100. 100 Interactive Queries App Streams API store = kafkaStreams .store(name, types) value = store.get(key) From our App, how to query the state store? - Get the store « by name & types» - Then the value « by key » READ ONLY (Streams DSL) Kafka Cluster
  101. 101. 101 Interactive Queries App Streams API store = kafkaStreams .store(name, types) value = store.get(key) You can serve that value to your client Front End key Kafka Cluster
  102. 102. 102 Interactive Queries App Streams API store = kafkaStreams .store(name, types) value = store.get(key) We add App nodes to make it scale Which App to call to get the value ? Front End App Streams API App Streams API ? ? ? key Kafka Cluster
  103. 103. 103 Interactive Queries App Streams API store = kafkaStreams .store(name, types) value = store.get(key) We add App nodes to make it scale Which App to call to get the value ? è Any node è We shift the problem to the App Front End App Streams API App Streams API key Kafka Cluster
  104. 104. 104 Interactive Queries App Streams API metadata = kafkaStreams .metadataForKey(name,key) host = metadata.host() port = metadata.port() How does the App locate the value? - Thanks to the metadata exchanged with the coordinator - Some simple configuration is required Front End App Streams API App Streams API key Kafka Cluster Metadata
  105. 105. 105 Interactive Queries App Streams API metadata = kafkaStreams .metadataForKey(name,key) host = metadata.host() port = metadata.port() Once the data is located, the App forwards the call to the target node Front End App Streams API App Streams API key Kafka Cluster Metadata
  106. 106. 106 Interactive Queries App Streams API metadata = kafkaStreams .metadataForKey(name,key) host = metadata.host() port = metadata.port() Beware! The state store can be queried only in « RUNNING » state è Not during a rebalance è May impact your SLAs if you expose the data to your customers Front End App Streams API App Streams API key App Streams API Kafka Cluster
  107. 107. 107 Interactive Queries App Streams API metadata = kafkaStreams .metadataForKey(name,key) host = metadata.host() port = metadata.port() Solution ? Second App cluster, but: - More resources... - 1 more hop Front End key Kafka Cluster App Streams API App Streams API App Streams API App Streams API App (b) Streams API App (a) Streams API
  108. 108. 108 • subscribe() • poll() • send() • flush() Consumer, Producer • mapValues() • filter() • punctuate() Kafka Streams • Select…from… • Join…where… • Group by.. KSQL Flexibility Simplicity Trade offs
  109. 109. 109 KSQL for Data Exploration SELECT status, bytes FROM clickstream WHERE user_agent = 'Mozilla/5.0 (compatible; MSIE 6.0)';
  110. 110. 110 KSQL for Streaming ETL Fact 1 Fact 2 Fact 3 Fact 4 Fact 5 Fact 6 id 1 id 2 id 3Business
  111. 111. 111 KSQL for Streaming ETL Fact 1 Fact 2 Fact 3 Fact 4 Fact 5 Fact 6 id 1 id 2 id 3 Fact X Fact Y Fact Z
  112. 112. 112 KSQL for Streaming ETL Fact 1 Fact 2 Fact 3 Fact 4 Fact 5 Fact 6 id 1 id 2 id 3 Fact X Fact Y Fact Z Fact A Fact B Fact C Fact D Fact E Fact K Fact L Fact M Fact N Id X
  113. 113. 113 KSQL for Streaming ETL CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum';
  114. 114. 114 Nested Types SELECT eventid, address.city FROM users WHERE address.state = 'CA';
  115. 115. 115 User Defined Functions (UDF) SELECT eventid, anomaly(sensorinput) FROM sensor @Udf(description = "apply analytic model to sensor input") public String anomaly(String sensorinput){ return your_logic; }
  116. 116. 116 KSQL for Anomaly Detection CREATE TABLE possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count(*) > 3;
  117. 117. 117
  118. 118. 118 Plenty of KSQL Recipies https://www.confluent.io/stream-processing-cookbook/
  119. 119. 119 Plenty of KSQL Recipies https://www.confluent.io/stream-processing-cookbook/
  120. 120. 120 Plenty of KSQL Recipies https://www.confluent.io/stream-processing-cookbook/
  121. 121. 121 KSQL: Enable Stream Processing using SQL-like Semantics Example Use Cases • Streaming ETL • Anomaly detection • Event monitoring Leverage Kafka Streams API without any coding required KSQL server Engine (runs queries) REST API CLIClients Confluent Control Center GUI Kafka Cluster Use any programming language Connect via CLI or Control Center user interface
  122. 122. 122 KSQL is really Kafka Stream ? ... yes!
  123. 123. 123 • subscribe() • poll() • send() • flush() Consumer, Producer • mapValues() • filter() • punctuate() Kafka Streams • Select…from… • Join…where… • Group by.. KSQL Flexibility Simplicity Trade offs
  124. 124. 124 Lowering the Bar to Enter the World of Streaming Kafka User Population CodingSophistication Core Java developers Core developers who don’t use Java/Scala Data engineers, architects, DevOps/SRE BI analysts streams
  125. 125. 125 125 Schema
  126. 126. 126 The Challenge of Data Compatibility at Scale : implicit à explicit ! App 1 App 2 App 3 Many sources without a policy causes mayhem in a centralized data pipeline Ensuring downstream systems can use the data is key to an operational stream pipeline Example: Date formats Even within a single application, different formats can be presented Incompatibly formatted message
  127. 127. 127 Schema Registry: Make Data Backwards Compatible and Future-Proof ● Define the expected fields for each Kafka topic ● Automatically handle schema changes (e.g. new fields) ● Prevent backwards incompatible changes ● Support multi-data center environments Elastic Cassandra HDFS Example Consumers Serializer App 1 Serializer App 2 ! Kafka Topic! Schema Registry
  128. 128. 128 128 Deployment
  129. 129. 129 Which one do you prefer ? • Zip • Yum/apt • Ansible • Docker • DC/OS • Helm-charts • Confluent Operator • ... Cloud!
  130. 130. 130 130 Tools
  131. 131. 131 Plenty ! https://cwiki.apache.org/confluence/display/KAFKA/System+Tools https://github.com/dharmeshkakadia/awesome-kafka https://www.google.com/ J
  132. 132. 132 132 Monitoring
  133. 133. 133 System Health Are all brokers and topics available? How much data is being processed? What can be tuned to improve performance? End-to-End SLA Monitoring Does Kafka process all events <15 seconds? Is the 8am report missing data? Are there duplicate events?
  134. 134. 134 Monitoring https://github.com/framiere/monitoring-demo
  135. 135. 135 Confluent Control Center– Cluster Health & Administration Cluster health dashboard • Monitor the health of your Kafka clusters and get alerts if any problems occur • Measure system load, performance, and operations • View aggregate statistics or drill down by broker or topic Cluster administration • Monitor topic configurations
  136. 136. 136 View consumer-partition lag across topics for a consumer group Alert on max consumer group lag across all topics Consumer Lag Monitoring 136
  137. 137. 137 137 Resources
  138. 138. 138 Confluent resources
  139. 139. 139 Optimizing Your Apache Kafka® Deployment https://www.confluent.io/white-paper/optimizing-your-apache-kafka-deployment/
  140. 140. 140 Resources - Confluent Enterprise Reference Architecture https://www.confluent.io/whitepaper/confluent-enterprise-reference-architecture/
  141. 141. 141 141 Community
  142. 142. 142 Resources – Community Slack and Mailing List https://slackpass.io/confluentcommunity https://groups.google.com/forum/#!forum/confluent-platform
  143. 143. 143 Confluent Blog
  144. 144. 144 Confluent Platform Demo : cp-demo https://github.com/confluentinc/cp-demo With security inside!
  145. 145. 145 Examples Examples Examples ! https://github.com/confluentinc/examples
  146. 146. 146 A Kafka Story https://github.com/framiere/a-kafka-story
  147. 147. 147 Kafka Boom Boom https://github.com/Dabz/kafka-boom-boom
  148. 148. 148 148 Take Away
  149. 149. 149 Kafka Provides a Central Nervous System for the Modern Digital Enterprise Enabling companies to respond accurately and in real time to business events
  150. 150. 150 150 Jeudi: Neil Avery KAFKA - THE ASYNCHRONOUS MICROSERVICES RUNTIME FOR STATE, SCALE AND PERFORMANCE Vendredi 14:30 - 15:15 - Florent & Loulou APACHE KAFKA : PATTERNS / ANTI-PATTERNS Vendredi: 15:30 – 17:30 - Florent, Nicolas & Loulou APACHE KAFKA - LES MAINS DEDANS

×