SlideShare une entreprise Scribd logo
1  sur  28
Introduction to Kafka
Akash Vacher
2015/12/5
▪ Akash Vacher
SRE,
Data Infrastructure Streaming (Bengaluru)
Linkedin
SRE?
▪ Site Reliability Engineers
– Administrators
– Architects
– Developers
▪ Keep the site running, always
Agenda
▪ Kafka Overview
▪ Some facts and figures
▪ Basic Kafka concepts
▪ Some use cases
▪ Q and A
Kafka Overview
▪ High-throughput distributed messaging system
▪ Kafka guarantees:
– At least once delivery
– Strong ordering
▪ Developed at Linkedin and open sourced in early 2011
▪ Implemented in Scala and Java
Kafka users
Source: https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
Attributes of a Kafka Cluster
• Disk Based
• Durable
• Scalable
• Low Latency
• Finite Retention
Motivation
▪ Unified platform to handle all real time data feeds
▪ High throughput
▪ Stream Processing
▪ Horizontally scalable
Before
After
How is Kafka used at Linkedin?
▪ Monitoring (inGraphs)
▪ User tracking
▪ Email and SMS notifications
▪ Stream processing (Samza)
▪ Database Replication
Facts and figures
▪ Over 1,300,000,000,000 messages are produced to Kafka everyday at
LinkedIn
▪ 300 Terabytes of inbound and 900 Terabytes of outbound traffic
▪ 4.5 Million messages per second, on single cluster
▪ Kafka runs on ~1300 servers at LinkedIn
Building blocks
The humble log
Anatomy of a topic
Consumer groups
Bird’s eye view
Kafka in action
Broker
A
P0
A
P1
A
P0
Consumer
Producer
Zookeeper
Performance recipe
▪ OS page cache
▪ Linear IO, never fear the file system!
▪ sendfile(), system call
▪ Message batching
Operating Kafka
▪ Broker Hardware
– Cisco C240, Intel xeon quad core, 64GB RAM , 14 disk Raid-10
▪ Zookeeper Hardware
– 5 + 1 ensemble, 64GB RAM, 500GB SSD
Operating Kafka
▪ Monitoring
– Under Replicated Partitions
– Unclean leader election
– Lag monitoring
– Burrow
▪ Cluster rebalance
– Sizewise rebalance
– Partitionwise rebalance
Kafka at Linkedin
▪ Multiple data centers
▪ Mirror data
▪ Cluster Types
– Tracking
– Metrics
– Queuing
▪ Data transport from applications to Hadoop, and back
Metrics collection
▪ Building Blocks
– Sensors
– RRD
– Front end
▪ Facts & Figures
– 320,000,000 metrics
collected per minute
– 530 TB of disk space
– Over 210,000 metrics
collected per service
InGraphs
Kafka for database replication - Master slave
Kafka for database replication - Multi master
How Can You Get Involved?
▪ http://kafka.apache.org
▪ Join the mailing lists
–users@kafka.apache.org
▪ irc.freenode.net - #apache-kafka
▪ Contribute
Questions?

Contenu connexe

Tendances

Tendances (20)

Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Kafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersKafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced Producers
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Benefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use CasesBenefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use Cases
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connect
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overview
 

En vedette

En vedette (20)

Change Data Capture using Kafka
Change Data Capture using KafkaChange Data Capture using Kafka
Change Data Capture using Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 
Introduction to Databus
Introduction to DatabusIntroduction to Databus
Introduction to Databus
 
IoT Data as Service with Hadoop
IoT Data as Service with HadoopIoT Data as Service with Hadoop
IoT Data as Service with Hadoop
 
Event-Stream Processing with Kafka
Event-Stream Processing with KafkaEvent-Stream Processing with Kafka
Event-Stream Processing with Kafka
 
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
 
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
 
Databus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineDatabus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture Pipeline
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Apache Kafka Security
Apache Kafka Security Apache Kafka Security
Apache Kafka Security
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016
 
Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 

Similaire à Introduction to Kafka

Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
confluent
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Timothy Spann
 

Similaire à Introduction to Kafka (20)

Kafka - Linkedin's messaging backbone
Kafka - Linkedin's messaging backboneKafka - Linkedin's messaging backbone
Kafka - Linkedin's messaging backbone
 
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
 
Operational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in KafkaOperational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in Kafka
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
 
East Bay Java User Group Oct 2014 Spark Streaming Kinesis Machine Learning
 East Bay Java User Group Oct 2014 Spark Streaming Kinesis Machine Learning East Bay Java User Group Oct 2014 Spark Streaming Kinesis Machine Learning
East Bay Java User Group Oct 2014 Spark Streaming Kinesis Machine Learning
 
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
 
xPatterns - Spark Summit 2014
xPatterns - Spark Summit   2014xPatterns - Spark Summit   2014
xPatterns - Spark Summit 2014
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
 
Icinga 2010 at OSMC
Icinga 2010 at OSMCIcinga 2010 at OSMC
Icinga 2010 at OSMC
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stack
 
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache KafkaKafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
 
Consensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdfConsensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdf
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
 
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
 
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikKeeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
 
Icinga 2011 at Chemnitzer Linuxtage
Icinga 2011 at Chemnitzer LinuxtageIcinga 2011 at Chemnitzer Linuxtage
Icinga 2011 at Chemnitzer Linuxtage
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
 

Dernier

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Dernier (20)

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 

Introduction to Kafka

Notes de l'éditeur

  1. Kafka – a high throughput messaging system
  2. SRE stands for Site Reliability Engineering. SRE combines several roles that fit together into one Operations position Foremost, we are administrators. We manage all of the systems in our area We are also architects. We do capacity planning for our deployments, plan out our infrastructure in new datacenters, and make sure all the pieces fit together And we are also developers. We identify tools we need, both to make our jobs easier and to keep our users happy, and we write and maintain them. At the end of the day, our job is to keep the site running, always.
  3. Kafka is distributed partitioned replicated commit log Kafka guarantees at least once delivery or messages and strong ordering on per partition basis.
  4. Some of the companies powered by Kafka. Source: https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
  5. Allows retention of data, which is a huge plus as it makes bootstrapping a new service from a past point of time easy. There is durability due to redundancy on partition level Horizontally scalable Most of the reads that hit the kafka brokers are served off the memory which results in low latency reads for a consumer which is relatively caught up Custom data expiry rule
  6. Apache Kafka was built at LinkedIn with a specific purpose in mind: to serve as a central repository of data streams. There were two major motivations: 1)The first problem was how to transport data between systems. We had lots of data systems and each of these needed reliable feeds of data in a geographically distributed environment 2)The second part of this problem was the need to do richer analytical data processing—the kind of thing that would normally happen in a data warehouse or Hadoop cluster—but with very low latency It was evident that a system that catered to both the above needs would need to have high throughput and be horizontally scalable as well.
  7. Initially, our approach was very ad hoc: we built custom piping between systems and applications on an as needed basis and shoe-horned any asynchronous processing into request-response web services. Over time this set-up got more and more complex as we ended up building pipelines between all kinds of different systems.
  8. After we introduced Kafka, the producers and the consumers got completely decoupled and this allowed services to just connect to a central system for all their data production/consumption needs without worrying about the other services which may be consuming/producing this data.
  9. We have many use cases of Kafka at Linkedin, here are summaries of a few of them Every application emits metrics into Kafka and we have systems that read and store this data to generate Graphs and thresholds User tracking of all website activities, clicks, page views, experiments which we turn on for subsets of users. Each time you visit LinkedIn many different services are called to generate the page you are looking at, each service sends a message to kafka with details of that request. We then later analyze all of that data with a Samza job that allows us to build a full call tree for the particular request. We can then use this data to troubleshoot issues on the site. Samza, by the way, is another open source product developed at LinkedIn that our team supports. All of the emails that get sent out from LinkedIn go through Kafka at least one time, and often a few times. They are often generated in Hadoop, sent to a production system using Kafka which then decorates the emails with additional information and then sends it back in to Kafka for another application to read and turn into an actual email. We stream changes to our search indexes in real time through Kafka to allow us to update search results in real time. We also use Kafka combined with Apache Samza to standardize things like Job titles, phone numbers and addresses. We are also currently exploring the use case of using Kafka to replicate databases. The rough idea is that a stream of transactions received by a database can be copied over through kafka to another db and replayed in the same order to achieve same state as the first database.
  10. All of the previous use cases I described, and many more add up to a ton of data. 1.3T messages per day. As it is evident, the total read traffic is almost thrice the write traffic. This is where data retention really shines as Kafka does not have to push the data to consumers every time it is read. The data resides on disk and any consumer can access and start reading the data for a Kafka cluster. We replicate most of the data between datacenters to keep applications in sync.
  11. Simple data structure Writes happen on tail Messages are in chronological order from head to tail Easy movement in stream by offset Allows read scalability
  12. A “message” is a discrete unit of data within Kafka Clients who send data into Kafka are called Producers Clients who read data from Kafka are called Consumers Every message that gets sent to Kafka belongs to a Topic, this allows for different types of data to be sent into a single cluster. The topic is then divided into multiple partitions for parallelism. These partitions exist across kafka servers (brokers) that make up the Kafka cluster. This diagram depicts how data is written into partitions.
  13. Messaging traditionally has two models: queuing and publish-subscribe. In a queue, a pool of consumers may read from a server and each message goes to one of them; in publish-subscribe the message is broadcast to all consumers. Kafka offers a single consumer abstraction that generalizes both of these—the consumer group. Consumers label themselves with a consumer group name, and each message published to a topic is delivered to one consumer instance within each subscribing consumer group. Consumer instances can be in separate processes within a single host, or on separate machines. If all the consumer instances have the same consumer group, then this works just like a traditional queue balancing load over the consumers. If all the consumer instances have different consumer groups, then this works like publish-subscribe and all messages are broadcast to all consumers.
  14. This shows how the data flows through a cluster.
  15. Kafka is a publish-subscribe messaging system, in which there are four components: - Broker (what we call the Kafka server) - Zookeeper (which serves as a data store for information about the cluster and consumers) - Producer (sends data into the system) - Consumer (reads data out of the system) Data is organized into topics (here we show a topic named “A”) and topics are split into partitions (we have partitions 0 and 1 here). A “message” is a discrete unit of data within Kafka. Producers create messages and send them into the system. The broker stores them, and any number of consumers can then read those messages. In order to provide scalability, we have multiple brokers. By spreading out the partitions, we can handle more messages in any topic. This also provides redundancy. We can now replicate partitions on separate brokers. When we do this, one broker is the designated “leader” for each partition. This is the only broker that producers and consumers connect to for that partition. The brokers that hold the replicas are designated “followers” and all they do with the partition is keep it in sync with the leader. When a broker fails, one of the brokers holding an in-sync replica takes over as the leader for the partition. The producer and consumer clients have logic built-in to automatically rebalance and find the new leader when the cluster changes like this. When the original broker comes back online, it gets its replicas back in sync, and then it functions as the follower.
  16. Kafka is incredibly fast for a few reasons: Most reads never actually hit the disk – usually consumers are caught up. Head seek time reduction due to linear IO On a read Kafka utilizes the sendfile() system call which allows the data to be directly written to a socket without first being loaded into the application. This reduces context switching. Batching allows higher throughput and better compression
  17. We run Kafka on hardware with lots of disk spindles in a RAID 10 configuration. We put our Zookeeper clusters on SSDs which brought our average request latency down to zero milliseconds
  18. We monitor Kafka in several different ways with tooling developed by the SRE team. Lag monitoring, lag is defined as the number of messages between the latest message available in Kafka and the newest message available in Kafka. Under Replicated Partitions, this is the count of Follower replicas which have fallen behind the leader. This metric is reported per broker. In the healthy state these should always be zero. Unclean leader elections. When this happens data has been lost. This occurs when there is a leader failure and there was not a follower who was insync at that time. Burrow is a tool developed and open sourced by one of the Kafka SREs at LinkedIn. It is our new way of monitoring Lag within Kafka which uses velocity calculations to determine if a consumer is falling behind. We have also developed tooling to ensure all brokers within a cluster are doing the same amount of work. in the Size based balance we ensure that each broker has the same amount of data on disk. If they are not within our defined threshold we move the optimal number of partitions around to make it balanced. In the Partition based balance we ensure that each broker has the same number of partitions. If they are not within our defined threshold we move the optimal number of partitions around to make it balanced.
  19. Cluster types: User activities on linkedin sites are tracked. These data flow into the tracking clusters. Linkedin has multiple colos and users are served from different colo based on their unique ID. The tracking data goes to the local tracking clusters. We have aggregator cluster, which gets the data aggregated from the multiple colos using mirror makers. The downstream application which process the tracking data consumes from the aggregate clusters OS and application generate metrics, and these metrics are used for understanding state of the system. These values are pumped into a separate metrics cluster. More about metrics in the next slide Queuing cluster is used for the traditional queuing scenarios when you have multiple applications and you want to coordinate their activities.
  20. We at Linkedin use Kafka for pumping metrics into our graphing engine – InGraphs The basic idea is that we have have services which expose a certain set of metrics using Mbeans which are picked up using sensors, processed, and pumped into Kafka. These enriched metrics are all consumed by a service which filters metrics by tags and push this data into RRD. These RRDs are used to generate graphs which are served to the end user.
  21. This is just a sample screenshot of final graphs in InGraphs. Different colors correspond to different hosts
  22. One new use case for Kafka at LinkedIn is for Database replication. In this diagram we show how this is done. The database on the left streams its transaction log into Kafka. The data replicator consumes the transaction log stream from Kafka and replays them into the database on the right. This is a great method for doing cross-datacenter replication of databases. One of the obvious advantage over the traditional master slave database replication is the decoupling of both databases. To initially start the secondary database you first must create a backup snapshot of the data in DB1, and load it into DB2. After that DB2 can listen to the transaction log stream via Data Replicator and stay in sync.
  23. This also works for a master master relationship where you stream the transactions originating in the second colo back to the database in first colo. Additional filtering logic is added to Data Replicator to ensure that a loop is not created, in other words, the transaction originating in colo A needs to be mirrored to colo B but should not be replicated back to colo A.
  24. So how can you get more involved in the Kafka community? The most obvious answer is to go apache.kafka.org. From there you can: 1) Join the mailing lists, either on the development or the user side 2) You can also dive into the source repository, and work on and contribute your own tools back.