The first presentation for Kafka Meetup @ Linkedin (Bangalore) held on 2015/12/5
It provides a brief introduction to the motivation for building Kafka and how it works from a high level.
Please download the presentation if you wish to see the animated slides.
5. Kafka Overview
▪ High-throughput distributed messaging system
▪ Kafka guarantees:
– At least once delivery
– Strong ordering
▪ Developed at Linkedin and open sourced in early 2011
▪ Implemented in Scala and Java
11. How is Kafka used at Linkedin?
▪ Monitoring (inGraphs)
▪ User tracking
▪ Email and SMS notifications
▪ Stream processing (Samza)
▪ Database Replication
12. Facts and figures
▪ Over 1,300,000,000,000 messages are produced to Kafka everyday at
LinkedIn
▪ 300 Terabytes of inbound and 900 Terabytes of outbound traffic
▪ 4.5 Million messages per second, on single cluster
▪ Kafka runs on ~1300 servers at LinkedIn
22. Kafka at Linkedin
▪ Multiple data centers
▪ Mirror data
▪ Cluster Types
– Tracking
– Metrics
– Queuing
▪ Data transport from applications to Hadoop, and back
23. Metrics collection
▪ Building Blocks
– Sensors
– RRD
– Front end
▪ Facts & Figures
– 320,000,000 metrics
collected per minute
– 530 TB of disk space
– Over 210,000 metrics
collected per service
27. How Can You Get Involved?
▪ http://kafka.apache.org
▪ Join the mailing lists
–users@kafka.apache.org
▪ irc.freenode.net - #apache-kafka
▪ Contribute
SRE stands for Site Reliability Engineering.
SRE combines several roles that fit together into one Operations position
Foremost, we are administrators. We manage all of the systems in our area
We are also architects. We do capacity planning for our deployments, plan out our infrastructure in new datacenters, and make sure all the pieces fit together
And we are also developers. We identify tools we need, both to make our jobs easier and to keep our users happy, and we write and maintain them.
At the end of the day, our job is to keep the site running, always.
Kafka is distributed partitioned replicated commit logKafka guarantees at least once delivery or messages and strong ordering on per partition basis.
Some of the companies powered by Kafka.Source: https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
Allows retention of data, which is a huge plus as it makes bootstrapping a new service from a past point of time easy.
There is durability due to redundancy on partition level
Horizontally scalable
Most of the reads that hit the kafka brokers are served off the memory which results in low latency reads for a consumer which is relatively caught up
Custom data expiry rule
Apache Kafka was built at LinkedIn with a specific purpose in mind: to serve as a central repository of data streams.
There were two major motivations:
1)The first problem was how to transport data between systems. We had lots of data systems and each of these needed reliable feeds of data in a geographically distributed environment
2)The second part of this problem was the need to do richer analytical data processing—the kind of thing that would normally happen in a data warehouse or Hadoop cluster—but with very low latency
It was evident that a system that catered to both the above needs would need to have high throughput and be horizontally scalable as well.
Initially, our approach was very ad hoc: we built custom piping between systems and applications on an as needed basis and shoe-horned any asynchronous processing into request-response web services.Over time this set-up got more and more complex as we ended up building pipelines between all kinds of different systems.
After we introduced Kafka, the producers and the consumers got completely decoupled and this allowed services to just connect to a central system for all their data production/consumption needs without worrying about the other services which may be consuming/producing this data.
We have many use cases of Kafka at Linkedin, here are summaries of a few of them
Every application emits metrics into Kafka and we have systems that read and store this data to generate Graphs and thresholds
User tracking of all website activities, clicks, page views, experiments which we turn on for subsets of users. Each time you visit LinkedIn many different services are called to generate the page you are looking at, each service sends a message to kafka with details of that request. We then later analyze all of that data with a Samza job that allows us to build a full call tree for the particular request. We can then use this data to troubleshoot issues on the site.
Samza, by the way, is another open source product developed at LinkedIn that our team supports.
All of the emails that get sent out from LinkedIn go through Kafka at least one time, and often a few times. They are often generated in Hadoop, sent to a production system using Kafka which then decorates the emails with additional information and then sends it back in to Kafka for another application to read and turn into an actual email.
We stream changes to our search indexes in real time through Kafka to allow us to update search results in real time.
We also use Kafka combined with Apache Samza to standardize things like Job titles, phone numbers and addresses.
We are also currently exploring the use case of using Kafka to replicate databases. The rough idea is that a stream of transactions received by a database can be copied over through kafka to another db and replayed in the same order to achieve same state as the first database.
All of the previous use cases I described, and many more add up to a ton of data. 1.3T messages per day.
As it is evident, the total read traffic is almost thrice the write traffic. This is where data retention really shines as Kafka does not have to push the data to consumers every time it is read. The data resides on disk and any consumer can access and start reading the data for a Kafka cluster.
We replicate most of the data between datacenters to keep applications in sync.
Simple data structure
Writes happen on tail
Messages are in chronological order from head to tail
Easy movement in stream by offset
Allows read scalability
A “message” is a discrete unit of data within Kafka
Clients who send data into Kafka are called Producers
Clients who read data from Kafka are called Consumers
Every message that gets sent to Kafka belongs to a Topic, this allows for different types of data to be sent into a single cluster. The topic is then divided into multiple partitions for parallelism.
These partitions exist across kafka servers (brokers) that make up the Kafka cluster.
This diagram depicts how data is written into partitions.
Messaging traditionally has two models: queuing and publish-subscribe. In a queue, a pool of consumers may read from a server and each message goes to one of them; in publish-subscribe the message is broadcast to all consumers. Kafka offers a single consumer abstraction that generalizes both of these—the consumer group.
Consumers label themselves with a consumer group name, and each message published to a topic is delivered to one consumer instance within each subscribing consumer group. Consumer instances can be in separate processes within a single host, or on separate machines.
If all the consumer instances have the same consumer group, then this works just like a traditional queue balancing load over the consumers.
If all the consumer instances have different consumer groups, then this works like publish-subscribe and all messages are broadcast to all consumers.
This shows how the data flows through a cluster.
Kafka is a publish-subscribe messaging system, in which there are four components:
- Broker (what we call the Kafka server)
- Zookeeper (which serves as a data store for information about the cluster and consumers)
- Producer (sends data into the system)
- Consumer (reads data out of the system)
Data is organized into topics (here we show a topic named “A”) and topics are split into partitions (we have partitions 0 and 1 here).
A “message” is a discrete unit of data within Kafka. Producers create messages and send them into the system. The broker stores them, and any number of consumers can then read those messages.
In order to provide scalability, we have multiple brokers. By spreading out the partitions, we can handle more messages in any topic.
This also provides redundancy. We can now replicate partitions on separate brokers. When we do this, one broker is the designated “leader” for each partition. This is the only broker that producers and consumers connect to for that partition. The brokers that hold the replicas are designated “followers” and all they do with the partition is keep it in sync with the leader.
When a broker fails, one of the brokers holding an in-sync replica takes over as the leader for the partition. The producer and consumer clients have logic built-in to automatically rebalance and find the new leader when the cluster changes like this. When the original broker comes back online, it gets its replicas back in sync, and then it functions as the follower.
Kafka is incredibly fast for a few reasons:
Most reads never actually hit the disk – usually consumers are caught up.
Head seek time reduction due to linear IO
On a read Kafka utilizes the sendfile() system call which allows the data to be directly written to a socket without first being loaded into the application. This reduces context switching.
Batching allows higher throughput and better compression
We run Kafka on hardware with lots of disk spindles in a RAID 10 configuration.
We put our Zookeeper clusters on SSDs which brought our average request latency down to zero milliseconds
We monitor Kafka in several different ways with tooling developed by the SRE team.
Lag monitoring, lag is defined as the number of messages between the latest message available in Kafka and the newest message available in Kafka.
Under Replicated Partitions, this is the count of Follower replicas which have fallen behind the leader. This metric is reported per broker. In the healthy state these should always be zero.
Unclean leader elections. When this happens data has been lost. This occurs when there is a leader failure and there was not a follower who was insync at that time.
Burrow is a tool developed and open sourced by one of the Kafka SREs at LinkedIn. It is our new way of monitoring Lag within Kafka which uses velocity calculations to determine if a consumer is falling behind.
We have also developed tooling to ensure all brokers within a cluster are doing the same amount of work.
in the Size based balance we ensure that each broker has the same amount of data on disk. If they are not within our defined threshold we move the optimal number of partitions around to make it balanced.
In the Partition based balance we ensure that each broker has the same number of partitions. If they are not within our defined threshold we move the optimal number of partitions around to make it balanced.
Cluster types:
User activities on linkedin sites are tracked. These data flow into the tracking clusters. Linkedin has multiple colos and users are served from different colo based on their unique ID. The tracking data goes to the local tracking clusters. We have aggregator cluster, which gets the data aggregated from the multiple colos using mirror makers. The downstream application which process the tracking data consumes from the aggregate clusters
OS and application generate metrics, and these metrics are used for understanding state of the system. These values are pumped into a separate metrics cluster. More about metrics in the next slide
Queuing cluster is used for the traditional queuing scenarios when you have multiple applications and you want to coordinate their activities.
We at Linkedin use Kafka for pumping metrics into our graphing engine – InGraphs
The basic idea is that we have have services which expose a certain set of metrics using Mbeans which are picked up using sensors, processed, and pumped into Kafka. These enriched metrics are all consumed by a service which filters metrics by tags and push this data into RRD. These RRDs are used to generate graphs which are served to the end user.
This is just a sample screenshot of final graphs in InGraphs.Different colors correspond to different hosts
One new use case for Kafka at LinkedIn is for Database replication. In this diagram we show how this is done.
The database on the left streams its transaction log into Kafka. The data replicator consumes the transaction log stream from Kafka and replays them into the database on the right. This is a great method for doing cross-datacenter replication of databases.
One of the obvious advantage over the traditional master slave database replication is the decoupling of both databases.
To initially start the secondary database you first must create a backup snapshot of the data in DB1, and load it into DB2. After that DB2 can listen to the transaction log stream via Data Replicator and stay in sync.
This also works for a master master relationship where you stream the transactions originating in the second colo back to the database in first colo.Additional filtering logic is added to Data Replicator to ensure that a loop is not created, in other words, the transaction originating in colo A needs to be mirrored to colo B but should not be replicated back to colo A.
So how can you get more involved in the Kafka community?
The most obvious answer is to go apache.kafka.org. From there you can:
1) Join the mailing lists, either on the development or the user side
2) You can also dive into the source repository, and work on and contribute your own tools back.