SlideShare une entreprise Scribd logo
1  sur  43
KAFKA
Mahendran Ponnusamy
Kafka
• Distributed, scalable, durable, fault-tolerant, high throughput publish-subscribe messaging system
• Originally developed at LinkedIn
• Neha Narkhede and Jun Rao - Confluet
• Unified platform for handling all the real-time data feeds,
LinkedIn defines four categories of messages: queuing, metrics, logs and tracking data that each live in their own
cluster.
1. Stream Processing
2. Website Activity Tracking
3. Metrics Collection and Monitoring
4. Log Aggregation.
5. ..
6. ..
Common Use cases
Comparison with other systems
JMS, RabbitMQ, … Apache Kafka
Push model Pull model
Persistent message with TTL Retention Policy
Guaranteed delivery Guaranteed “Consumability”
Hard to scale Scalable
Fault tolerance – Active – passive Fault tolerance – ISR (In Sync Replicas)
• Topic
• Stream of records of particular category.
• Each record consists of a key, a value, and a timestamp
(from 0.10.0)
• Producer
• Publishing messages to a topic
• Broker
• Where the messages are stored
• Consumer
• Subscribes to one or more Topics
• consumes the messages from the PARTITIONS
• Multiple producers and consumers can publish and retrieve
messages at the same time.
LinkedIn runs over 1100 Kafka brokers organized into more than 60 clusters.
Key Components
Broker and Topic
Metadata
Consumer metadata
partition offsets
Kafka Cluster
producer
Kafka Broker1
consumer
zookeeper
Kafka Broker 2
Kafka Broker n
Topic Partition 0
Topic Partition 1
Topic Partition n
Zookeeper and Kafka
• Electing a controller
• Cluster membership - When adding a new broker or failure of brokers
• Topic Configuration
• (0.9.0) - Quotas - how much data is each client allowed to read and write
• (0.9.0) - ACLs - who is allowed to read and write to which topic
• All write requests are routed through leader and changes are broadcast to all followers. Change
broadcast is termed as atomic broadcast.
Core API
• Producer API
• Consumer API
• Streams API
• Connector API
KAFKA STORAGE
Brokers
• Data storage
• Stateless
• Identified by broker.id
Message Format
Messages
• Messages are stored as Key-Value pairs
• Message Ids are incremental but not consecutive
• Messages are identified by 64 bit offset integer
CRC Magic Byte Attributes Key Length Key Message length Message
Current Message Format
• MessageAndOffset => MessageSize Offset Message
MessageSize => int32
Offset => int64
• Message => Crc MagicByte Attributes KeyLength Key ValueLength Value
Crc => int32
MagicByte => int8
Attributes => int8
KeyLength => int32
Key => bytes
ValueLength => int32
Value => bytes
• The magic byte (int8) contains the version id of the message, currently set to 0.
• The attribute byte (int8) holds metadata attributes about the message. The lowest 2 bits contain the compression codec used for
the message. The other bits are currently set to 0.
• The key / value field can be omitted if the keylength / valuelength field is set to -1.
• For compressed message, the offset field stores the last wrapped message's offset.
Topic
• Topic are messages of particular category
• A topic can have zero, one, or many consumers that subscribe to the data written to it
Partition
• A topic can be divided into partitions which may be distributed.
• An ordered, immutable sequence of records that is continually appended to - a structured
commit log
Partition continued…
• Distribute the data across brokers (think sharding)
• Simplify parallelization
• Ensure sequencing of related messages[ordering]
• Kafka only provides a total order over messages within a partition, not between different
partitions in a topic
• Replicated
-rw-r--r-- 1 kafka kafka 0 Nov 4 16:03 edhTopic-45/00000000000000000000.log
-rw-r--r-- 1 kafka kafka 10485760 Nov 4 16:03 edhTopic-45/00000000000000000000.index
-rw-r--r-- 1 kafka kafka 0 Nov 4 16:03 edhTopic-61/00000000000000000000.log
-rw-r--r-- 1 kafka kafka 10485760 Nov 4 16:03 edhTopic-61/00000000000000000000.index
Log
• Simplest possible storage abstraction.
• A log is implemented as a set of segment files of approximately the same size (e.g., 1GB).
• Append-only, totally-ordered sequence of records ordered by time.
• A log, like a filesystem, is easy to optimize for linear read and write patterns.
• The log can group small reads and writes together into larger, high-throughput operations
Log continued…
• Every segment of a log (the files *.log) has it's corresponding index (the files *.index) with the
same name as they represent the base offset.
• The log file contains the actual messages structured in a message format.
• Messages begin with 64 bit integer offset
• Properties:
• log.roll.hours
The maximum time before a new log segment is rolled out (in hours), secondary to log.roll.ms
property
• log.roll.minutes
The maximum time before a new log segment is rolled out (in milliseconds). If not set, the value in
log.roll.hours is used
• log.segment.bytes
The maximum size of a single log file
Segment File
• A segment with a base offset of [base_offset] would be stored in two files, a [base_offset].index
and a [base_offset].log file.
• The broker simply appends the message to the last(active) segment file.
• Segment file is flushed to the disk after configurable numbers of messages have been published
• log.flush.interval.messages
• Or after a certain amount of time elapsed
• log.flush.interval.ms
• Messages are exposed to consumer after it gets flushed.
Index file
• Maps the logical offset of the message to the physical location
• The structure of the messages within the index file describes only 2 fields, each of them 32bit long:
• 4 Bytes: Relative Offset
• 4 Bytes: Physical Position
<offset, time-stamp, physical position>
Kafka Storage
Architecture
Key Points
• Topic partitions can be replicated zero or n times and distributed across the Kafka cluster
• Each topic partition has one leader and zero or n followers depends on replication factor
• Leader maintains so called In Sync Replicas (ISR) defined by delay behind the partition leader is
lower than replica.lag.max.ms
[root@worker1 config]# kafka-topics --describe --zookeeper worker1:2181/kafka --topic edhTopic
Topic:edhTopic PartitionCount:72 ReplicationFactor:2 Configs:
Topic: edhTopic Partition: 0 Leader: 26 Replicas: 26,27 Isr: 26,27
Topic: edhTopic Partition: 1 Leader: 27 Replicas: 27,25 Isr: 25,27
Topic: edhTopic Partition: 2 Leader: 25 Replicas: 25,26 Isr: 25,26
Topic: edhTopic Partition: 3 Leader: 26 Replicas: 26,25 Isr: 25,26
Topic: edhTopic Partition: 4 Leader: 27 Replicas: 27,26 Isr: 26,27
Topic: edhTopic Partition: 5 Leader: 25 Replicas: 25,27 Isr: 25,27
Topic: edhTopic Partition: 6 Leader: 26 Replicas: 26,27 Isr: 26,27
Topic: edhTopic Partition: 7 Leader: 27 Replicas: 27,25 Isr: 25,27
Producer
Producer
• The producer sends data directly to the broker that is the leader for the partition without any intervening
routing tier
• Request for metadata about which servers are alive and where the leaders for the partitions of a topic are at
any given time to allow the producer to appropriately direct its requests.
The client controls which partition it publishes messages to. This can be done at random, implementing a kind of random load balancing, or it
can be done by some semantic partitioning function
Producer Execution Flow
Constructing a Kafka producer
• Bootstrap servers
• Key serializer
• Value serializer
kafkaProps.put("bootstrap.servers", "broker1:9092,broker2:9092");
kafkaProps.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
kafkaProps.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Partitioning
• The client controls which partition it publishes messages to
• Random partitioning
• when the partitioning key is not specified or null, A producer will pick a random partition and
stick to it for some time (default is 10 mins) before switching to another one
• If key exists and default partitioner is used
• Key.hashcode() / no. of partitions
• Can implement custom partitioning
• Implements an interface org.apache.kafka.clients.producer.Partitioner;
• Fire-and-forget
• send a message to the server and don’t really care
• Synchronous Send
• if the send() was successful or not
• Asynchronous Send
• send messages asynchronously and still handle error scenarios,
• callback function, which gets triggered when receive a response from the Kafka broker.
producer.send(record);
//send with callback
producer.send(record, new DemoProducerCallback());
producer.send(record).get();
Batching
• Batching the data to be sent to same partition of a topic
• allows the accumulation of more bytes to send, and few larger I/O operations on the servers.
• Requests sent to brokers will contain multiple batches, one for each partition with data available
to be sent.
• Fixed number of messages or to wait no longer than some fixed latency bound (say 10 ms)
• This can be achieved by:
batch.size = 16384
linger.ms = 5ms
Buffer memory
• Amount of memory the producer will use to buffer messages waiting to be sent to brokers.
• If records are sent faster than they can be delivered to the server the producer will either block or
throw an exception based on the preference specified by block.on.buffer.full.
• Property:
buffer.memory
Durability
• While producer sends messages to Kafka it can require different levels of consistency:
• acks = 0
producer doesn’t wait for confirmation
• acks = 1
wait for acknowledge from leader
• acks = all
wait for acknowledge from all ISR ~ message commit
Consumer
Consumer
1. bootstrap.servers,
2. key.deserializer
3. value.deserializer
props.put("bootstrap.servers", "broker1:9092,broker2:9092");
props.put("group.id", "CountryCounter");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Collections.singletonList("customerCountries"));
Consumer Poll Loop
• Handles coordination, partition
rebalances, heartbeats and data
fetching
• consumer.wakeup()
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
{
//some process
}
}
} finally {
consumer.close();
}
How consumer consumes?
• Consumer always consumes messages from a particular partition sequentially and if the
consumer acknowledges particular message offset, it implies that the consumer has consumed all
prior messages
• Consumer sends pull request to the broker to have the bytes ready to consume
• Each request have the offset of the message to consume
Consumer Group
Topic T1
Partition 0
Partition 1
Partition 2
Partition 4
Consumer Group g1
Consumer 1
Consumer 2
Consumer 3
Consumer 4
When multiple consumers are subscribed to a topic and belong to the same consumer group, then each
consumer in the group will receive messages from a different subset of the partitions in the topic
Committing Offsets
• Automatic Commit
• Allow the consumer to do it for you
• Commit Current Offset
• Exercise more control over the time offsets are
committed
• Offsets will only be committed when the
application explicitly chooses to do so
• Retry the commit until it either succeeds or
encounters a non-retriable failure
• Asynchronous commit
• CommitAsync() will not retry.
• CommitAsync() also gives you an option to pass
in a callback that will be triggered when the
broker responds
• Executes the callback in case of any failures
enable.auto.commit = true
auto.commit.interval.ms= 5 sec[default]
auto.commit.offset = false,
consumer.commitSync();
auto.commit.offset = false,
consumer.commitAsync(new OffsetCommitCallback());
Message Delivery Semantics
• At Most Once
• Commit and process
• Might lose some messages when processing fails.
• At Least Once
• Process and Commit
• Might get Duplicates when Commit fails
• Exactly-Once
• With exactly-once semantics, messages are pulled one or more times, processed only once, and delivery is
guaranteed.
• Exactly-once semantics is ideal for operational applications, as it guarantees no duplicates or missing data.
• Many enterprise applications, like those used for credit card processing, require exactly-once semantics.
Log Compaction and Delete
Retention policy
• For a specific amount of time
• log.retention.hours
• log.retention.minutes
• log.retention.ms
• For a specific total size of messages in a partition
• log.retention.bytes
Log Compaction
• Compaction is a process where for each message key is just one message, usually the latest one.
• config/server.properties:
log.cleaner.enable=true
• To enable log cleaning on a particular topic
log.cleanup.policy=compact
key points about log compaction
• The “min.cleanable.dirty.ratio” is a setting at the topic and broker level that determines how
“dirty” a topic needs to be before it is cleaned. You can set it to 0.01 to be aggressive in cleaning
• Log compaction runs on its own threads, and it defaults to 1 thread. It isn’t unusual for a cleaner
thread to die.
• Compaction is done in the background by periodically recopying log segments
• Log compaction will never happen on the LAST segment. Segments can be rolled over based on
time or size, or both. The default time based rollover is 7 days
How Compaction works
As of Kafka 0.9.0.1, Configuration parameter log.cleaner.enable is now true by default. This means topics with a
cleanup.policy=compact will now be compacted by default, and 128 MB of heap will be allocated to the cleaner
process via log.cleaner.dedupe.buffer.size
Deletion of messages
• Compaction also allows for deletes.
• A message with a key and a null payload will be treated as a delete from the log.
• This delete marker will cause any prior message with that key to be removed (as would any new
message with that key), but delete markers are special in that they will themselves be cleaned out
of the log after a period of time to free up space
Web(li)ography
• https://cwiki.apache.org/confluence/display/KAFKA/FAQ
• https://www.ibm.com/developerworks/linux/library/j-zerocopy/
• http://kafka.apache.org/documentation.html#brokerconfigs
• http://events.linuxfoundation.org/sites/events/files/slides/The%20Best%20of%20Apache%20Kafka%20Architecture.pdf
• https://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client
• https://apache.googlesource.com/kafka/+/0.8.1/core/src/main/scala/kafka/log/LogSegment.scala
• https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Enriched+Message+Metadata
• https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-
unifying
• http://www.shayne.me/blog/2015/2015-06-25-everything-about-kafka-part-2/
• https://www.safaribooksonline.com/library/view/kafka-the-definitive/9781491936153/ch03.html#writing_messages_to_kafka
• https://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client
• https://kafka.apache.org/090/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html
• https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design
• https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Enriched+Message+Metadata
Questions/feedback?
mahen.it@gmail.com

Contenu connexe

Tendances

Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraDataStax
 
Linked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarLinked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarKarthik Ramasamy
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to heroAvi Levi
 
Apache CloudStack: API to UI (STLLUG)
Apache CloudStack: API to UI (STLLUG)Apache CloudStack: API to UI (STLLUG)
Apache CloudStack: API to UI (STLLUG)Joe Brockmeier
 
Document Similarity with Cloud Computing
Document Similarity with Cloud ComputingDocument Similarity with Cloud Computing
Document Similarity with Cloud ComputingBryan Bende
 
LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...kawamuray
 
Messaging queue - Kafka
Messaging queue - KafkaMessaging queue - Kafka
Messaging queue - KafkaMayank Bansal
 
Kafka as Message Broker
Kafka as Message BrokerKafka as Message Broker
Kafka as Message BrokerHaluan Irsad
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEkawamuray
 
Understanding kafka
Understanding kafkaUnderstanding kafka
Understanding kafkaAmitDhodi
 

Tendances (20)

Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Kafka101
Kafka101Kafka101
Kafka101
 
Kafka tutorial
Kafka tutorialKafka tutorial
Kafka tutorial
 
Kafka basics
Kafka basicsKafka basics
Kafka basics
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache Cassandra
 
Linked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarLinked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache Pulsar
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to hero
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka introduction
Apache kafka introductionApache kafka introduction
Apache kafka introduction
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache CloudStack: API to UI (STLLUG)
Apache CloudStack: API to UI (STLLUG)Apache CloudStack: API to UI (STLLUG)
Apache CloudStack: API to UI (STLLUG)
 
Document Similarity with Cloud Computing
Document Similarity with Cloud ComputingDocument Similarity with Cloud Computing
Document Similarity with Cloud Computing
 
LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...
 
Messaging queue - Kafka
Messaging queue - KafkaMessaging queue - Kafka
Messaging queue - Kafka
 
Kafka as Message Broker
Kafka as Message BrokerKafka as Message Broker
Kafka as Message Broker
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINE
 
Understanding kafka
Understanding kafkaUnderstanding kafka
Understanding kafka
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 

Similaire à Kafka overview v0.1

Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scalejimriecken
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache KafkaChhavi Parasher
 
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...Streamlio
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka TLV
 
Hands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache PulsarHands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache PulsarSijie Guo
 
Pune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPrashant Rane
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdfTarekHamdi8
 
kafka simplicity and complexity
kafka simplicity and complexitykafka simplicity and complexity
kafka simplicity and complexityPaolo Platter
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentSpeedment, Inc.
 
Redis Streams - Fiverr Tech5 meetup
Redis Streams - Fiverr Tech5 meetupRedis Streams - Fiverr Tech5 meetup
Redis Streams - Fiverr Tech5 meetupItamar Haber
 
Kafka - Messaging System
Kafka - Messaging SystemKafka - Messaging System
Kafka - Messaging SystemTanuj Mehta
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...Yaroslav Tkachenko
 
Kafka 10000 feet view
Kafka 10000 feet viewKafka 10000 feet view
Kafka 10000 feet viewyounessx01
 
Scality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup PresentationScality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup PresentationScality
 
Pulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scalePulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scaleMatteo Merli
 

Similaire à Kafka overview v0.1 (20)

Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scale
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to hero
 
Hands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache PulsarHands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache Pulsar
 
Pune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCD
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdf
 
kafka simplicity and complexity
kafka simplicity and complexitykafka simplicity and complexity
kafka simplicity and complexity
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ Speedment
 
Redis Streams - Fiverr Tech5 meetup
Redis Streams - Fiverr Tech5 meetupRedis Streams - Fiverr Tech5 meetup
Redis Streams - Fiverr Tech5 meetup
 
Kafka Deep Dive
Kafka Deep DiveKafka Deep Dive
Kafka Deep Dive
 
Kafka - Messaging System
Kafka - Messaging SystemKafka - Messaging System
Kafka - Messaging System
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
 
Kafka 10000 feet view
Kafka 10000 feet viewKafka 10000 feet view
Kafka 10000 feet view
 
Kafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internalsKafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internals
 
Scality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup PresentationScality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup Presentation
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Pulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scalePulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scale
 

Dernier

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Dernier (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Kafka overview v0.1

  • 2. Kafka • Distributed, scalable, durable, fault-tolerant, high throughput publish-subscribe messaging system • Originally developed at LinkedIn • Neha Narkhede and Jun Rao - Confluet • Unified platform for handling all the real-time data feeds, LinkedIn defines four categories of messages: queuing, metrics, logs and tracking data that each live in their own cluster. 1. Stream Processing 2. Website Activity Tracking 3. Metrics Collection and Monitoring 4. Log Aggregation. 5. .. 6. .. Common Use cases
  • 3. Comparison with other systems JMS, RabbitMQ, … Apache Kafka Push model Pull model Persistent message with TTL Retention Policy Guaranteed delivery Guaranteed “Consumability” Hard to scale Scalable Fault tolerance – Active – passive Fault tolerance – ISR (In Sync Replicas)
  • 4. • Topic • Stream of records of particular category. • Each record consists of a key, a value, and a timestamp (from 0.10.0) • Producer • Publishing messages to a topic • Broker • Where the messages are stored • Consumer • Subscribes to one or more Topics • consumes the messages from the PARTITIONS • Multiple producers and consumers can publish and retrieve messages at the same time. LinkedIn runs over 1100 Kafka brokers organized into more than 60 clusters. Key Components Broker and Topic Metadata Consumer metadata partition offsets Kafka Cluster producer Kafka Broker1 consumer zookeeper Kafka Broker 2 Kafka Broker n Topic Partition 0 Topic Partition 1 Topic Partition n
  • 5. Zookeeper and Kafka • Electing a controller • Cluster membership - When adding a new broker or failure of brokers • Topic Configuration • (0.9.0) - Quotas - how much data is each client allowed to read and write • (0.9.0) - ACLs - who is allowed to read and write to which topic • All write requests are routed through leader and changes are broadcast to all followers. Change broadcast is termed as atomic broadcast.
  • 6. Core API • Producer API • Consumer API • Streams API • Connector API
  • 8. Brokers • Data storage • Stateless • Identified by broker.id
  • 9. Message Format Messages • Messages are stored as Key-Value pairs • Message Ids are incremental but not consecutive • Messages are identified by 64 bit offset integer CRC Magic Byte Attributes Key Length Key Message length Message
  • 10. Current Message Format • MessageAndOffset => MessageSize Offset Message MessageSize => int32 Offset => int64 • Message => Crc MagicByte Attributes KeyLength Key ValueLength Value Crc => int32 MagicByte => int8 Attributes => int8 KeyLength => int32 Key => bytes ValueLength => int32 Value => bytes • The magic byte (int8) contains the version id of the message, currently set to 0. • The attribute byte (int8) holds metadata attributes about the message. The lowest 2 bits contain the compression codec used for the message. The other bits are currently set to 0. • The key / value field can be omitted if the keylength / valuelength field is set to -1. • For compressed message, the offset field stores the last wrapped message's offset.
  • 11. Topic • Topic are messages of particular category • A topic can have zero, one, or many consumers that subscribe to the data written to it
  • 12. Partition • A topic can be divided into partitions which may be distributed. • An ordered, immutable sequence of records that is continually appended to - a structured commit log
  • 13. Partition continued… • Distribute the data across brokers (think sharding) • Simplify parallelization • Ensure sequencing of related messages[ordering] • Kafka only provides a total order over messages within a partition, not between different partitions in a topic • Replicated -rw-r--r-- 1 kafka kafka 0 Nov 4 16:03 edhTopic-45/00000000000000000000.log -rw-r--r-- 1 kafka kafka 10485760 Nov 4 16:03 edhTopic-45/00000000000000000000.index -rw-r--r-- 1 kafka kafka 0 Nov 4 16:03 edhTopic-61/00000000000000000000.log -rw-r--r-- 1 kafka kafka 10485760 Nov 4 16:03 edhTopic-61/00000000000000000000.index
  • 14. Log • Simplest possible storage abstraction. • A log is implemented as a set of segment files of approximately the same size (e.g., 1GB). • Append-only, totally-ordered sequence of records ordered by time. • A log, like a filesystem, is easy to optimize for linear read and write patterns. • The log can group small reads and writes together into larger, high-throughput operations
  • 15. Log continued… • Every segment of a log (the files *.log) has it's corresponding index (the files *.index) with the same name as they represent the base offset. • The log file contains the actual messages structured in a message format. • Messages begin with 64 bit integer offset • Properties: • log.roll.hours The maximum time before a new log segment is rolled out (in hours), secondary to log.roll.ms property • log.roll.minutes The maximum time before a new log segment is rolled out (in milliseconds). If not set, the value in log.roll.hours is used • log.segment.bytes The maximum size of a single log file
  • 16. Segment File • A segment with a base offset of [base_offset] would be stored in two files, a [base_offset].index and a [base_offset].log file. • The broker simply appends the message to the last(active) segment file. • Segment file is flushed to the disk after configurable numbers of messages have been published • log.flush.interval.messages • Or after a certain amount of time elapsed • log.flush.interval.ms • Messages are exposed to consumer after it gets flushed.
  • 17. Index file • Maps the logical offset of the message to the physical location • The structure of the messages within the index file describes only 2 fields, each of them 32bit long: • 4 Bytes: Relative Offset • 4 Bytes: Physical Position <offset, time-stamp, physical position>
  • 19. Key Points • Topic partitions can be replicated zero or n times and distributed across the Kafka cluster • Each topic partition has one leader and zero or n followers depends on replication factor • Leader maintains so called In Sync Replicas (ISR) defined by delay behind the partition leader is lower than replica.lag.max.ms [root@worker1 config]# kafka-topics --describe --zookeeper worker1:2181/kafka --topic edhTopic Topic:edhTopic PartitionCount:72 ReplicationFactor:2 Configs: Topic: edhTopic Partition: 0 Leader: 26 Replicas: 26,27 Isr: 26,27 Topic: edhTopic Partition: 1 Leader: 27 Replicas: 27,25 Isr: 25,27 Topic: edhTopic Partition: 2 Leader: 25 Replicas: 25,26 Isr: 25,26 Topic: edhTopic Partition: 3 Leader: 26 Replicas: 26,25 Isr: 25,26 Topic: edhTopic Partition: 4 Leader: 27 Replicas: 27,26 Isr: 26,27 Topic: edhTopic Partition: 5 Leader: 25 Replicas: 25,27 Isr: 25,27 Topic: edhTopic Partition: 6 Leader: 26 Replicas: 26,27 Isr: 26,27 Topic: edhTopic Partition: 7 Leader: 27 Replicas: 27,25 Isr: 25,27
  • 21. Producer • The producer sends data directly to the broker that is the leader for the partition without any intervening routing tier • Request for metadata about which servers are alive and where the leaders for the partitions of a topic are at any given time to allow the producer to appropriately direct its requests. The client controls which partition it publishes messages to. This can be done at random, implementing a kind of random load balancing, or it can be done by some semantic partitioning function
  • 23. Constructing a Kafka producer • Bootstrap servers • Key serializer • Value serializer kafkaProps.put("bootstrap.servers", "broker1:9092,broker2:9092"); kafkaProps.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); kafkaProps.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
  • 24. Partitioning • The client controls which partition it publishes messages to • Random partitioning • when the partitioning key is not specified or null, A producer will pick a random partition and stick to it for some time (default is 10 mins) before switching to another one • If key exists and default partitioner is used • Key.hashcode() / no. of partitions • Can implement custom partitioning • Implements an interface org.apache.kafka.clients.producer.Partitioner;
  • 25. • Fire-and-forget • send a message to the server and don’t really care • Synchronous Send • if the send() was successful or not • Asynchronous Send • send messages asynchronously and still handle error scenarios, • callback function, which gets triggered when receive a response from the Kafka broker. producer.send(record); //send with callback producer.send(record, new DemoProducerCallback()); producer.send(record).get();
  • 26. Batching • Batching the data to be sent to same partition of a topic • allows the accumulation of more bytes to send, and few larger I/O operations on the servers. • Requests sent to brokers will contain multiple batches, one for each partition with data available to be sent. • Fixed number of messages or to wait no longer than some fixed latency bound (say 10 ms) • This can be achieved by: batch.size = 16384 linger.ms = 5ms
  • 27. Buffer memory • Amount of memory the producer will use to buffer messages waiting to be sent to brokers. • If records are sent faster than they can be delivered to the server the producer will either block or throw an exception based on the preference specified by block.on.buffer.full. • Property: buffer.memory
  • 28. Durability • While producer sends messages to Kafka it can require different levels of consistency: • acks = 0 producer doesn’t wait for confirmation • acks = 1 wait for acknowledge from leader • acks = all wait for acknowledge from all ISR ~ message commit
  • 30. Consumer 1. bootstrap.servers, 2. key.deserializer 3. value.deserializer props.put("bootstrap.servers", "broker1:9092,broker2:9092"); props.put("group.id", "CountryCounter"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props); consumer.subscribe(Collections.singletonList("customerCountries"));
  • 31. Consumer Poll Loop • Handles coordination, partition rebalances, heartbeats and data fetching • consumer.wakeup() try { while (true) { ConsumerRecords<String, String> records = consumer.poll(100); for (ConsumerRecord<String, String> record : records) { //some process } } } finally { consumer.close(); }
  • 32. How consumer consumes? • Consumer always consumes messages from a particular partition sequentially and if the consumer acknowledges particular message offset, it implies that the consumer has consumed all prior messages • Consumer sends pull request to the broker to have the bytes ready to consume • Each request have the offset of the message to consume
  • 33. Consumer Group Topic T1 Partition 0 Partition 1 Partition 2 Partition 4 Consumer Group g1 Consumer 1 Consumer 2 Consumer 3 Consumer 4 When multiple consumers are subscribed to a topic and belong to the same consumer group, then each consumer in the group will receive messages from a different subset of the partitions in the topic
  • 34. Committing Offsets • Automatic Commit • Allow the consumer to do it for you • Commit Current Offset • Exercise more control over the time offsets are committed • Offsets will only be committed when the application explicitly chooses to do so • Retry the commit until it either succeeds or encounters a non-retriable failure • Asynchronous commit • CommitAsync() will not retry. • CommitAsync() also gives you an option to pass in a callback that will be triggered when the broker responds • Executes the callback in case of any failures enable.auto.commit = true auto.commit.interval.ms= 5 sec[default] auto.commit.offset = false, consumer.commitSync(); auto.commit.offset = false, consumer.commitAsync(new OffsetCommitCallback());
  • 35. Message Delivery Semantics • At Most Once • Commit and process • Might lose some messages when processing fails. • At Least Once • Process and Commit • Might get Duplicates when Commit fails • Exactly-Once • With exactly-once semantics, messages are pulled one or more times, processed only once, and delivery is guaranteed. • Exactly-once semantics is ideal for operational applications, as it guarantees no duplicates or missing data. • Many enterprise applications, like those used for credit card processing, require exactly-once semantics.
  • 37. Retention policy • For a specific amount of time • log.retention.hours • log.retention.minutes • log.retention.ms • For a specific total size of messages in a partition • log.retention.bytes
  • 38. Log Compaction • Compaction is a process where for each message key is just one message, usually the latest one. • config/server.properties: log.cleaner.enable=true • To enable log cleaning on a particular topic log.cleanup.policy=compact
  • 39. key points about log compaction • The “min.cleanable.dirty.ratio” is a setting at the topic and broker level that determines how “dirty” a topic needs to be before it is cleaned. You can set it to 0.01 to be aggressive in cleaning • Log compaction runs on its own threads, and it defaults to 1 thread. It isn’t unusual for a cleaner thread to die. • Compaction is done in the background by periodically recopying log segments • Log compaction will never happen on the LAST segment. Segments can be rolled over based on time or size, or both. The default time based rollover is 7 days
  • 40. How Compaction works As of Kafka 0.9.0.1, Configuration parameter log.cleaner.enable is now true by default. This means topics with a cleanup.policy=compact will now be compacted by default, and 128 MB of heap will be allocated to the cleaner process via log.cleaner.dedupe.buffer.size
  • 41. Deletion of messages • Compaction also allows for deletes. • A message with a key and a null payload will be treated as a delete from the log. • This delete marker will cause any prior message with that key to be removed (as would any new message with that key), but delete markers are special in that they will themselves be cleaned out of the log after a period of time to free up space
  • 42. Web(li)ography • https://cwiki.apache.org/confluence/display/KAFKA/FAQ • https://www.ibm.com/developerworks/linux/library/j-zerocopy/ • http://kafka.apache.org/documentation.html#brokerconfigs • http://events.linuxfoundation.org/sites/events/files/slides/The%20Best%20of%20Apache%20Kafka%20Architecture.pdf • https://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client • https://apache.googlesource.com/kafka/+/0.8.1/core/src/main/scala/kafka/log/LogSegment.scala • https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Enriched+Message+Metadata • https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas- unifying • http://www.shayne.me/blog/2015/2015-06-25-everything-about-kafka-part-2/ • https://www.safaribooksonline.com/library/view/kafka-the-definitive/9781491936153/ch03.html#writing_messages_to_kafka • https://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client • https://kafka.apache.org/090/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html • https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design • https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Enriched+Message+Metadata