Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Building Data Streaming
Platform with Apache Kafka
Serhii Kalinets
System Architect
History of Kafka
Created in Linkedin
Creators then founded Confluent
Why name is Kafka? Jay Kreps (Confluent CEO): I thoug...
Kafka use cases
Message Broker
Logs
Commit log
Streaming
What is Kafka
A publish/subscribe messaging system that has an
interface typical of messaging systems
but a storage layer ...
Messaging System
Messages
Topics
Partitions
Producers
Consumers
Messages
Key / Value pair, both can be nulls
Kafka treats both just as bytes
Serialization / deserialization happens on cl...
https://kafka.apache.org/intro
How many partitions?
What is the throughput you expect to achieve for the topic?
What is the maximum throughput you expect...
How many partitions?
Adding partitions later can be very challenging
Consider the number of partitions you will place on e...
Producers
Can specify partition explicitly or explicitly (via partitioners)
Decision is taken on producer side
Different S...
Producers guarantees
Kafka guarantees ordering within partition for producers
Can be broken for retries if max.in.flights....
Consumer Groups
Common group.id
One consumer is a group coordinator
Poll loop
Simple for developer: while (true) { consume...
Commits and offsets
Consumers commit their last offsets to Kafka
Automatic / manual commits
Sync / async commits
auto.offs...
Datastore
Partitions
Replicas
Segments
Compaction
Replication
Default topic configuration
Replication factor = 3
min.insync.replicas = 2
In producers: acks = all
Segments
Physical files with raw data
Kafka keep open handles to all segments, including inactive
Writes are being done to...
Retention
Kafka does not wait until all consumers read data
log.retention.ms -- retention by time
log.retention.bytes -- r...
Compaction: removes old data
Compaction
min.compaction.lag.ms when to compact messages
To delete event, send new message with key and null value
(tombs...
Brokers
Cluster use zookeeper to handle membership
One of broker is a controller (leader), it is responsible for partition...
Kafka guaranties
Durability and high availability
Message ordering in partition
At least once / exactly once
Transactions
Kafka Streams
High level DSL for working with Kafka topics as stream
Currently JVM only (Java / Scala)
DSL is rather simpl...
Kafka Streams
Kafka Connect
Is a framework for connecting Kafka with external systems such as
databases, key-value stores, search indexe...
Add connector to mysql
echo '{"name":"mysql-login-connector",
"config":{"connector.class": "JdbcSourceConnector",
"connect...
https://kafka.apache.org/intro
ksqlDB
is an event streaming database
SQL on top of Kafka streams + materialized views
ksqlDB Components
Streams: immutable sequences of events
Tables: mutable sequences of events
Stream processing: transform,...
Creating tables
CREATE TABLE currentCarLocations (
vehicleId VARCHAR,
latitude DOUBLE(10, 2),
longitude DOUBLE(10, 2)
) WI...
Queries
SELECT vehicleId,
latitude,
longitude
FROM currentCarLocations
WHERE ROWKEY = '6fd0fcdb'
EMIT CHANGES;
Advantages
Non developers can write their queries
Read from and write to many data sources
Much less code -- less bugs
Dat...
Our Roadmap
Consumer / producer API
Kafka Streams / Connect ← we are here
ksqlDB
Thanks!
serhii.kalinets@pm.bet
@skalinets
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apache Kafka
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apache Kafka
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apache Kafka
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apache Kafka
Prochain SlideShare
Chargement dans…5
×

DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apache Kafka

87 vues

Publié le

Apache Kafka зараз на хайпі. Все більше компаній починають використовувати її, як message bus. Проте Kafka може набагато більше, аніж бути просто транспортом. Її реальна міць і краса розкриваються, коли Kafka стає центральною нервовою системою вашої архітектури. Вона швидка, надійна і доволі гнучка для різних сценаріїв використання.
На цій доповіді Сергій поділитися досвідом побудови data streaming платформи. Ми поговоримо про те, як Kafka працює, як її потрібно конфігурувати і в які халепи можна потрапити, якщо Kafka використовується неоптимально.

Publié dans : Formation
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apache Kafka

  1. 1. Building Data Streaming Platform with Apache Kafka Serhii Kalinets System Architect
  2. 2. History of Kafka Created in Linkedin Creators then founded Confluent Why name is Kafka? Jay Kreps (Confluent CEO): I thought that since Kafka was a system optimized for writing, using a writer’s name would make sense. I had taken a lot of lit classes in college and liked Franz Kafka. Plus the name sounded cool for an open source project.
  3. 3. Kafka use cases Message Broker Logs Commit log Streaming
  4. 4. What is Kafka A publish/subscribe messaging system that has an interface typical of messaging systems but a storage layer more like a log-aggregation system
  5. 5. Messaging System Messages Topics Partitions Producers Consumers
  6. 6. Messages Key / Value pair, both can be nulls Kafka treats both just as bytes Serialization / deserialization happens on clients Confluent broker can validate messages against schema
  7. 7. https://kafka.apache.org/intro
  8. 8. How many partitions? What is the throughput you expect to achieve for the topic? What is the maximum throughput you expect to achieve when consuming from a single partition? Throughput for producers can be ignored
  9. 9. How many partitions? Adding partitions later can be very challenging Consider the number of partitions you will place on each broker and available disk space and network bandwidth per broker. Avoid overestimating, as each partition uses memory and other resources on the broker and will increase the time for leader elections.
  10. 10. Producers Can specify partition explicitly or explicitly (via partitioners) Decision is taken on producer side Different SKDs might have different default partitioners Adding new partitions can change partition assignments
  11. 11. Producers guarantees Kafka guarantees ordering within partition for producers Can be broken for retries if max.in.flights.requests.per.session > 1 Idempotent producers (retries will not cause duplicates) Transactions (messages sent within transactions will be available for consumers only after transaction completes)
  12. 12. Consumer Groups Common group.id One consumer is a group coordinator Poll loop Simple for developer: while (true) { consumer.poll(); processMessages(); } Complicated implementation: coordination, rebalancing, heartbeats etc.
  13. 13. Commits and offsets Consumers commit their last offsets to Kafka Automatic / manual commits Sync / async commits auto.offset.reset from where start reading (start or end)
  14. 14. Datastore Partitions Replicas Segments Compaction
  15. 15. Replication
  16. 16. Default topic configuration Replication factor = 3 min.insync.replicas = 2 In producers: acks = all
  17. 17. Segments Physical files with raw data Kafka keep open handles to all segments, including inactive Writes are being done to active segments Retention, compaction are applied only to inactive segments
  18. 18. Retention Kafka does not wait until all consumers read data log.retention.ms -- retention by time log.retention.bytes -- retention by size (per partition) log.segment.bytes -- size of when active segment is closed log.segment.ms -- time when active segment is closed
  19. 19. Compaction: removes old data
  20. 20. Compaction min.compaction.lag.ms when to compact messages To delete event, send new message with key and null value (tombstone) delete.retention.ms when tombstone can be deleted (the default is 24 hours) Compaction process is configurable (# of threads, resource consumption, frequency etc.)
  21. 21. Brokers Cluster use zookeeper to handle membership One of broker is a controller (leader), it is responsible for partition leader election There are plans to get rid of zookeeper
  22. 22. Kafka guaranties Durability and high availability Message ordering in partition At least once / exactly once Transactions
  23. 23. Kafka Streams High level DSL for working with Kafka topics as stream Currently JVM only (Java / Scala) DSL is rather simple (kind of map / join / reduce) Supports joins, filters, aggregations Streams and tables Handles all low level stuff
  24. 24. Kafka Streams
  25. 25. Kafka Connect Is a framework for connecting Kafka with external systems such as databases, key-value stores, search indexes, and file systems Built with Kafka streams Deploys as cluster via operators / helm charts Configurable via REST endpoint
  26. 26. Add connector to mysql echo '{"name":"mysql-login-connector", "config":{"connector.class": "JdbcSourceConnector", "connection.url":"jdbc:mysql://127.0.0.1:3306/test? user=root", "mode":"timestamp","table.whitelist":"login", "validate.non.null":false, "timestamp.column.name":"login_time","topic.prefix":"mysql."}}' | curl -X POST -d @- http://localhost:8083/connectors --header "content-Type:application/json"
  27. 27. https://kafka.apache.org/intro
  28. 28. ksqlDB is an event streaming database SQL on top of Kafka streams + materialized views
  29. 29. ksqlDB Components Streams: immutable sequences of events Tables: mutable sequences of events Stream processing: transform, filter, aggregate and join Push queries let you subscribe to a query's result as it changes in real-time. Pull queries allow you to fetch the current state of a materialized view.
  30. 30. Creating tables CREATE TABLE currentCarLocations ( vehicleId VARCHAR, latitude DOUBLE(10, 2), longitude DOUBLE(10, 2) ) WITH ( kafka_topic = 'locations', partitions = 3, key = 'vehicleId', value_format = 'json' );
  31. 31. Queries SELECT vehicleId, latitude, longitude FROM currentCarLocations WHERE ROWKEY = '6fd0fcdb' EMIT CHANGES;
  32. 32. Advantages Non developers can write their queries Read from and write to many data sources Much less code -- less bugs Data exploration
  33. 33. Our Roadmap Consumer / producer API Kafka Streams / Connect ← we are here ksqlDB
  34. 34. Thanks! serhii.kalinets@pm.bet @skalinets

×