Bài talk sẽ giới thiệu về Kafka, và đào sâu về các principles của Kafka, các thiết kế của Kafka để làm Kafka nhanh, scalable và độ ổn định cao. Bài talk cũng chia sẻ về cách Kafka servers tương tác với Kafka clients.
Bài talk đào sâu vào internals của Kafka và phân tích tại sao các design decisions được thiết kế như vậy. Bài talk phù hợp cho các bạn software engineer đã, đang muốn tìm hiểu về các job queue, message queue khác nhau.
Speaker: Nguyen Quang Minh
- Software Engineer, Technical Lead @ Employment Hero
- Contributor of `ruby-kafka` (the most popular Kafka client for Ruby)
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Grokking TechTalk #24: Kafka's principles and protocols
1. Kafka's principles and protocols
Minh Nguyen
Tech Lead @ Employment Hero
nguyenquangminh.info
2. Hello. My name is Minh
- A Ruby lover and a Golang amateur
- I’m a nerdy guy
- I love researching the underlying of systems
- I love open-source world
- And I am a cat owner
3. Agenda
- The problems we are solving at Employment Hero
- Fundamental concepts of Kafka
- Kafka producers
- Kafka consumers and consumer group
- Consumer group flow (if we have time)
- Introduction to Kafka protocols (if we have time)
4. The problems ...
- Employment Hero is a startup
whose main product is a HR
platform.
- It started in 2012, as a simple
Ruby on Rails application
developed by some developers.
Ruby on Rails
Feature A Feature B Feature C
5. Feature G Feature H ...
Feature D Feature E Feature F
The problems ...
- Now there are > 100 employees,
> 30 developers; multiple million
dollars fund.
- It becomes really huge system,
consists of hundred modules;
complicated frontend stacks and
2 mobile applications
- Finally, we start to follow the
microservice path, since 2017.
React jQuery Backbone ?!
Feature A Feature B Feature C
Ruby on Rails
Grape Sidekiq ...
6. Rails
The problems ...
Feature A
Sinatra
Feature B
Golang
Feature C
Main app
- The features with
good boundary are
extracted gradually
into smaller services
7. A concrete example
- When a user updates something, all the changes of the
operation are captured.
- The user’s supervisor and our support team are able to
view, filter and search the audits.
8. A concrete example
User A
Signs contract
User A uploads a signature
User A agree the contract terms
User A uses the signature in
the contract
Contract is marked Completed
User A is marked Onboarded
9. A concrete example
- A request could generate dozens of audits
- Each audit must go through a data pipeline:
+ Persistent storage
+ Full-text search indexing
+ Government reporting
- There are too much works in a single request
10. Our solution
User A
Signs contract Main app
User A uploads a signature
User A agree the contract terms
...
Message Queue
Audit
service
Produces
Consumes
Postgres
ElasticSearch
11. The message queue
- The message queue must be:
+ High available.
+ Durable.
+ Scalable.
+ Fast. Extremely fast.
- After a lot of consideration, we choose Kafka
12. What is Kafka?
- Open-source distributed streaming platform
- Act as a message queue, let us publish and subscribe to
streams of records
- Allow process the record stream in real time
- Able to connect to external systems for importing /
exporting
14. Kafka’s fundamental concepts
- Kafka organizes the messages by the concept of Topic.
- Each topic has many Partitions.
- Each partition is a list of durable messages.
- When a message is sent to Kafka under a topic, the
message is “sharded” to one partition of the topic.
- The message partition assignment is decided by the
producers.
16. Kafka’s fundamental concepts
Audit
1 2 43
Partition 1
1 2 3
Partition 2
1 2 43
Partition 3
5
User A uploads a signature
5 1 2 43
Partition 4
Producer
17. Kafka’s fundamental concepts
- The partitions could be distributed to multiple
machines. Each machine is called a Broker.
- Each broker could have 0, 1 or many partitions of the
same topic; or even ones of different topics.
19. Kafka’s fundamental concepts
- The messages are persisted into hard disk
- Kafka supports Replication to ensure the
high-availability and fault-tolerance.
- Each partition could have many replicas, based on
replication factor.
- The replicas are not necessarily on the same nodes
21. Kafka’s fundamental concepts
- Only the leader partitions are allowed to receive the
messages.
- Then they sync the message to the replicas.
22. Kafka concepts
Audit
1 2 43
Partition 1
1 2 3
Partition 2
1 2 43
Partition 3
5 1 2 43
Partition 4
Broker 101 Broker 102 Broker 103
1 2 43
Replica of 1
1 2 3
Replica of 2
1 2 43
Replica of 3
51 2 43
Replica of 4
Replication Factor = 2
User A uploads a signature
Producer
5
5
23. Kafka’s fundamental concepts
- When a leader partition dies, one of the replica is
elected to become a new leader partition.
- When that partition comes back, it becomes a replica
and fetches the missing data from others.
- All of this leader-replica mechanism is handled by
Apache Zookeeper
24. i
Kafka’s fundamental concepts
Audit
1 2 43
Partition 1
1 2 3
Partition 2
1 2 43
Partition 3
5 1 2 43
Partition 4
Broker 101 Broker 102 Broker 103
1 2 43
Replica of 1
1 2 3
Replica of 2
1 2 43
Partition 3
51 2 43
Partition 4
Broker dies
New leader
26. Kafka Producers
- Kafka Producers try to be simple
- At the beginning, the producers fetch the metadata from
+ List of brokers
+ Interesting topics and their partitions, replicas
- They interact directly with various brokers
- There are no centralized coordinators
27. Kafka Producers
1 2 43
Partition 1
1 2 3
Partition 2
1 2 43
Partition 3
5 1 2 43
Partition 4
Broker 101 Broker 102 Broker 103
1 2 43
Replica of 1
1 2 3
Replica of 2
1 2 43
Replica of 3
51 2 43
Replica of 4
Producer A
Producer B
28. Kafka Producers
- Remove the write-bottleneck completely.
- Each broker receives a reasonable number of messages
- Want to scale? Add more partitions and more brokers.
- The scaling is nearly linear
29. Kafka Consumers
- Kafka consumers are much more complicated.
- Just like producers, the consumers start their
operations by fetching the medata.
- Each consumer is able to connect to multiple brokers
and encouraged to read from replicas.
- Each broker handles a set of partitions from topics the
consumer is interested in at once
31. Kafka Consumers
- The workload between brokers are balance. Reduce
read-bottleneck
- Each consumer has plenty of replicas to read from.
- This increases the high availability and help in workload
balance
32. Kafka Consumers
Audit
1 2 43
Partition 1
1 2 3
Partition 2
Broker 101 Broker 102
1 2 43
Replica of 1
51 2 43
Replica of 2
Service A
1 2 3
Replica of 2
Broker 103
1 2 43
Replica of 1
5
Service B
1 2 3
Replica of 1
Broker 104
1 2 43
Replica of 2
5
Service C
33. Kafka Consumer Group
- To help consumers scale easily, the concept of
Consumer Group is introduced
- Each consumer belongs to a Consumer Group
- Each message is broadcasted to all the groups
- Each group member exclusively handles messages
from a partition
36. Kafka Consumer Group
- Each consumer is able to handle messages from more than
1 partition.
- Guarantee all partitions are covered
- Guarantee the message order within a partition
- The members in the group decide how to contribute
messages by themselves.
- Sometimes, Kafka is called “Dump brokers, smart consumer”
37. Kafka Consumer Group
- The workload is load-balanced between consumers of
the same group
- Want to scale? Increase brokers, increase partitions and
increases number of consumers
- Rule of thumb:
- Number of consumers <= Number of partitions
38. Let’s get back to our example
User A
Main app
User A uploads a signature
User A agree the contract terms
User A uses the signature in the
contract
Contract is marked Completed
User A is marked Onboarded
Partition 1
Broker 101
Replica of 2
Partition 2
Broker 102
Replica of 1
Audit
consumer 1
Audit
consumer 1
Partition 3
Replica of 3
User B
Main app
39. Kafka Consumer Group
- All those things about consumers and producers
satisfied the last two characteristics:
+ Scalability
+ High-performance (half of the story)
40. Consumer group flow
- At a time, there is a special broker that takes care of a
group called group coordinator
- The group coordinator is chosen randomly. Any broker
can become a group coordinator of a group
- Coordinator handles all group operations: join group,
sync group, heartbeat, commit offsets, etc.
42. Consumer group flow
1 2 43
Partition 1
1 2 3
Partition 2
1 2 43
Partition 3
5 1 2 43
Partition 4
Consumer Consumer Consumer Consumer
1. Ask bootstrap broker about the group coordinator
by Group Coordinator API.
For example: broker 101 is the group coordinator
Broker 101 Broker 102
Audit service
43. Consumer group flow
1 2 43
Partition 1
1 2 3
Partition 2
1 2 43
Partition 3
5 1 2 43
Partition 4
Consumer Consumer Consumer Consumer
2. Send join group request to group coordinator with
consumer’s supported protocols.
Broker 101 Broker 102
Join
Audit service
44. Consumer group flow
1 2 43
Partition 1
1 2 3
Partition 2
1 2 43
Partition 3
5 1 2 43
Partition 4
Consumer Consumer Consumer Consumer
Broker 101 Broker 102
Blocked
3. The new consumer is blocked by the group
coordinator. The coordinator waits for “other”
participants. Typically, it waits until all old group
members send join request or exceed a timeout
Audit service
45. Consumer group flow
1 2 43
Partition 1
1 2 3
Partition 2
1 2 43
Partition 3
5 1 2 43
Partition 4
Consumer Consumer Consumer Consumer
Broker 101 Broker 102
Error! Need to re-join
Blocked
4. After the group coordinator receives the join group
request, other consumers will be indicated about the
new member (via heartbeat, commit offset, etc). They
are required to send join group request again
Audit service
47. Consumer group flow
1 2 43
Partition 1
1 2 3
Partition 2
1 2 43
Partition 3
5 1 2 43
Partition 4
Consumer Consumer Consumer Consumer
Broker 101 Broker 102
4. When all members are in or exceed a timeout, the
group coordinator releases the block and returns
response back to the members.
Audit service
48. Consumer group flow
1 2 43
Partition 1
1 2 3
Partition 2
1 2 43
Partition 3
5 1 2 43
Partition 4
Consumer Consumer Consumer Consumer
Broker 101 Broker 102
5. A lucky member is chosen to become this
generation’s group leader. Its response attaches a list
of group members and each member’s metadata.Leader
Audit service
49. Consumer group flow
1 2 43
Partition 1
1 2 3
Partition 2
1 2 43
Partition 3
5 1 2 43
Partition 4
Consumer Consumer Consumer Consumer
Broker 101 Broker 102
6. The group leader assigns the workload to each
member based on the member’s metadata. Other
members don’t have to do this taskLeader
Audit service
50. Consumer group flow
1 2 43
Partition 1
1 2 3
Partition 2
1 2 43
Partition 3
5 1 2 43
Partition 4
Consumer Consumer Consumer
Broker 101 Broker 102
Sync
Sync
Consumer
Sync
Sync
7. All members continue to send sync group request.
Like join group request, sync group is a block request.
The leader’s request attaches member assignment
Audit service
51. Consumer group flow
1 2 43
Partition 1
1 2 3
Partition 2
1 2 43
Partition 3
5 1 2 43
Partition 4
Consumer Consumer Consumer Consumer
Broker 101 Broker 102
8. Each member receives the sync group response.
This response includes the current member
assignment
Audit service
52. Consumer group flow
1 2 43
Partition 1
1 2 3
Partition 2
1 2 43
Partition 3
5 1 2 43
Partition 4
Consumer Consumer Consumer Consumer
Broker 101 Broker 102
9. Finally, each consumer subscribes to
the partitions it is assigned. New
consumer becomes a group member
Audit service
53. Consumer group flow
- Provide flexibility and give consumers more power
- However, great power comes great responsibilities. The
consumer clients are usually complicated and hard to
implement right and full feature!
54. Kafka Protocol
- Kafka provides various powerful APIs for the clients
- It implements its own binary protocol over TCP
- The protocol follows request - response model
- There are about ~20 APIs in newest version
- Each API has its own version and Kafka ensures the
backward compatibility
55. Kafka Protocol
- Each field in the request / response has a type
- There are primitive types:
- int8, int16, int32, int64
- The composed types:
- string: [size in int16][string]
- bytes: [size in int32][bytes]
- Array is supported: [size in int32][e1][e2]...
56. Request format
Request Size (int32)
API Key (int16)
API Version (int16)
Correlation Id (int32)
ClientId (string)
Each API has a numeric API key
Each API has a specific version, which defines the body’s structure
The same as Correlation ID in the request
TopicMetadataRequest Number of topics (int32)
Topic 1 (string)
Topic 2 (string)
57. Request example
Request Size (int32)
API Key (int16)
API Version (int16)
Correlation Id (int32)
ClientId (string)
TopicMetadataRequest Number of topics (int32)
Topic 1 (string)
Topic 2 (string)
0x0003
0x0000002c
0x0000
0x00000000
0x000f 0x67726f6b6b696e672d636c69656e74
0x00000002
0x0005 0x5669657773
0x0006 0x4f7264657273
44
3
0
0
grokking-client
2
Views
Orders
Final request:
0x0000002c0003000000000000000f67726f6b6b696e672d636c69656e74000000020005566965777300064f
7264657273
58. Response format
Response Size (int32)
Correlation Id (int32)
TopicMetadataResponse
[Brokers]
Node ID (int32)
Host (string)
Port (int32)
[Topics]
ErrorCode (int16)
Topic name (string)
[Partitions]
ErrorCode (int16)
Partition ID (int32)
Replicas (array of int32)
Isr (array of int32)
Final response:
0x00000012d0000000000000003000003e9000c6163396662
3966343839343000002384000003eb000c353638636161336
36638303300002384000003ea000c38326234313034666634
376100002384000000020000000556696577730000000500
0000000002000003ea00000002000003ea000003eb000000
02000003ea000003eb000000000004000003e90000000200
0003e9000003eb00000002000003e9000003eb0000000000
01000003e900000002000003e9000003ea00000002000003
e9000003ea000000000003000003eb00000002000003eb00
0003ea00000002000003eb000003ea000000000000000003
eb00000002000003eb000003e900000002000003eb000003
e9000000064f726465727300000001000000000000000003e
900000001000003e900000001000003e9
301
0
64. Kafka is not a silver bullet
- Kafka is fast and crazily scalable.
- It is not easy to use.
- The client libraries are just the tools. It doesn’t solve all
of our problems.
- Therefore, it is great understand the underlying to
achieve more with Kafka.
65. What’s next?
- Kafka Stream
- Kafka transaction and exactly-once delivery
- Kafka internal architecture and implementations