This document discusses messaging queues and compares Kafka and Amazon SQS. It begins by explaining what a messaging queue is and provides examples of software that can be used, including Kafka, SQS, SNS, and RabbitMQ. It then discusses why messaging queues are useful by allowing for asynchronous and failed processing. The document proceeds to provide details on Kafka, including that it is a distributed streaming platform used by companies like LinkedIn, Twitter, and Netflix. It defines Kafka terminology and discusses how producers and consumers work. Finally, it compares features of SQS and Kafka like order of messages, delivery guarantees, retention, security, costs, and throughput.
2. What is Messaging Queue ?
Which software is best fit for our service ?
-Amazon SQS, Amazon SNS, Apache Kafka, Rabbit MQ, IBM MQ
Can we create our own Messaging Queue ?
Queue contains sequence of messages, sent between applications, awaiting their turn to be processed.
Message is the data to be sent from producer to consumer.
3. Why Messaging Queue ?
Why can’t we have Rest APIs everywhere ?
Sync Call
Failed Case
5. Kafka
Distributed streaming platform●
Real-time streaming of data.●
Can handle billions of messages in
a day.
●
High throughput, reliability,
replication capabilities.
●
Amazon MSK - Amazon Manager
Streaming for Apache Kafka.
●
Linkedin, Twitter, Netflix, etc.●
6. Kafka - Terminologies
Kafka Cluster - Cluster of one or more servers (Kafka Brokers) to maintain the load balanced.●
Kafka broker - Broker is a Kafka server. They shares information between each other.●
Bootstrap Server - Server used for the initial connection to Kafka cluster. Consists of
Host:Port.
●
Producer - Produces the message and send to a topic (partition).●
Consumer - Polls the message from the topic (partition).●
Consumer Group - A message can be read by once in each Consumer Group. - SNS Handlers●
7. Kafka - Terminologies
Topic - To store or publish particular streams of data. A topic can have one or more partitions.●
Partition - To support the parallelism for fast processing. - SQS Messaging Group.●
Segment - Data is stored into segments. A partition is divided into multiple segments.●
Offset - To uniquely identify the message in each partition. It starts from 0 for each partition.●
Zookeeper - Manages election algorithm for brokers. Each partition has its own leader.●
8. Producer
Sends data with topic only●
Producer partitioner decides the
partition.
○
Default Round-Robin algorithm is used.
We can implement our own.
○
Sends data with topic and Partition Id●
Directly selects the partition and sends
the data.
○
Sends data with topic and Partition Key●
Create a hash value of partition key and
basis that decides partition id.
○
It is similar to SQS message group id.○
9. Kafka Broker Data Storage
Segments
Segments are named by their base offset. The
base offset of a segment is an offset greater
than offsets in previous segments and less
than or equal to offsets in that segment.
segment.index - The segment index maps
offsets to their message’s position in the
segment log.
●
segment.log - stores the actual message.●
10. Consumer
All partitions are assigned to the
only consumer
Partitions are equally divided and
assign to the consumers
Each partition maps to each
consumer
When more no. of consumers -
they become idle
Each partition is only consumed by
a single consumer from the group
Partition Allocation
11. Consumer
Reads messages from a Parition
Offset: from-beginning●
On restart, reads from first available offset.○
Not from 1. As Kafka has default retention of
7 days.
○
Offset: earliest●
On restart, reads from last committed offset.○
Auto commit: commits after 5 sec of poll call.○
Manual commit: send the ack manually to
broker with the offset.
○
Offset: latest●
On restart, reads from the latest message.○
Used for Real-time cases. ○
12. Types of Message Delivery
At most once delivered●
If the producer does not retry when an ack times out, then the message might end up
not being written to the Kafka topic.
○
Producer waits for only one ack. - acks=1○
20 times faster.○
At least once delivered●
If a producer retries, if the broker had failed right before it sent the ack but after the
message was successfully written to the Kafka topic, this retry leads to the message
being written twice. (Standard SQS)
○
Producer waits for all the ack. -acks=all○
3 times faster.○
Exactly once delivered●
Unique identifier is required. So whenever producer sends the duplicate, broker will not
store that message again. - enable.idempotence=true (FIFO SQS)
○
Difficult to handle it at consumer end, manual offset needs to handled carefully.○
Alternate way is transaction from producer sends till ack received from consumer.○
13. Zookeeper
Electing a controller. It maintains the
leader/follower relationship for all the
partitions.
●
When a node shuts down, it tells other
replicas to become partition leaders.
●
Manage service discovery for Kafka
Brokers that form the cluster.
●
Sends changes of the topology to
Kafka, so each node in the cluster
knows when a new broker joined, a
Broker died, a topic was removed or a
topic was added, etc.
●
14. SQS vs Kafka
Paramter AWS SQS Apache Kafka
Order of Messages Standard Queue: can be out of order
FIFO Queue: in order within message group
in order within the partition
Message Delivery
Standard Queue: At least once delivered
FIFO Queue: Exactly once delivered
provide all three types of message delivery. At-most once,
atleast once and exactly once.
Retention
Default: 4 days
upto 14 days
Default: 7 days
upto 14 days
Metrics CloudWatch Metrics openTSDB - to analyse number of packets
yet to be consumed on each partition/
Security IAM, AWS KMS - Key Management Service Kerberos
Consume same message Connect SQS with SNS Using Consumer Group
Cost
Pay as you use
depends on req/sec and data transfer/sec
Open-source
Server cost and magement cost
Long Polling can reduce cost
max value- 20 sec
Not providing this feature
Poll interval is configurable
Exception Handling Dead-Letter Queues Handle manually - create a separate topic for this
Message Size
Default: 256 KB
to increase further - connect with S3 (support upto 2 GB)
Default: 1 MB
to increase further: change configs of producer, brokers, consumers
Serialization/Deserialization Default: String
Default: String
Avro, protobuf
Throughput Standard Queue: Unlimited
FIFO Queue: 300/sec (10 messages in batch- 3000/sec)
Very High