SlideShare a Scribd company logo
1 of 25
APACHE KAFKA
WHAT IS KAFKA?
• KAFKA is a distributed messaging system that was
originally developed at LinkedIn to serve as the
foundation for LinkedIn's activity stream and operational
data processing pipeline. It is now used at a variety of
different companies for various data pipeline and
messaging uses.
• LinkedIn developed Kafka for collecting and delivering
high volumes of log data with low latency for real-time
log processing.
• Scala and Zookeeper based.
WHAT IS KAFKA? CONTD..
• KAFKA can be used for both ONLINE(realtime) and
OFFLINE (integration with hadoop) analysis of log data.
• Distributed and scalable .
• High Throughput.
• More Priority to efficiency then fancy features.
• Simple API.
• Low Overhead.
APACHE KAFKA
=
+LOG AGGREGATOR MESSAGING SYSTEM
Where to use APACHE KAFKA?
• user activity events corresponding to logins, page-views,
clicks, “likes” ,sharing, comments, and search queries
• operational metrics (health monitoring)
• search relevance.
• Recommendations.
• Ad targeting and reporting.
• Protection against abnormal behavior.
• newsfeed
• Etc.
FLAWS in Traditional Messaging Systems.
• More focus on delivery guarantee.
• Less focus on throughput.
• Weak in distributed support.
• Performance degrades as Queue increases.
KAFKA DESIGN
PRINCIPLES
ARCHITECTURE
ARCHITECTURE contd..
• A stream of messages of particular type is defined by a topic.
• A topic is then divided into multiple partitions
• Each partition is a directory that has list of files and the files store
the message data.
• A producer can publish messages to a topic (push).
• Published messages are then stored on a set of servers called
brokers.
• Each broker stores one or more partitions
• Consumers can subscribe to any of the topics and can consume
messages by pulling data from the brokers.
• Supports both Point-to-point Delivery model and Publish/subscribe
Delivery model.
• Uses “Pull Based” system model.
WHY “PULL MODEL”?
• A “push-based” system has difficulty dealing with diverse
consumers as the broker controls the rate at which data
is transferred, (DOS attack = loss of data).
• In a Pull based system, a consumer simply falls behind
and catches up when it can.
• Consumer can rewind back to consume old messages.
DEPLOYMENT
Storage
• Simple Storage
• One each partition corresponds to a logical log.(a log ==
a list of files.)
• Physically a log is implemented using segmentation
where each segment is approximately of equal size.
• When a message is published , the broker simply
appends the message to the last segment file.
• Messages are flushed on to disk in a batch for better
performance.
• Messages is only exposed to consumers after they are
flushed.
• Messages are addressed by log offset. #lessoverhead.
Storage contd..
• Id of next message = length of current message + id of current
message.
Efficient Transfer
• Producer can submit multiple messages in a single send
request.( #End-to-endmessagebatching).
• Each pull request can consume multiple messages up to
a certain size.
• No caching of messages on the Kafta process layer.
Messages are only cached in the page cache, --“no
double buffering”. Hence very little overhead in garbage
collecting its memory.(#filesystemcaching)#lessoverhead
• Producer and Consumer access the data files
Sequentially. (disks are fast when accessed
sequentially).
Efficient Transfer :from local file to remote
socket
1. read data from the storage media to the page cache in an OS.
2. copy data in the page cache to an application buffer.
3. copy application buffer to another kernel buffer.
4. send the kernel buffer to the socket
• This process usually takes 4 data copying and one system call.
SENDFILE API
avoids 2 copies call and 1 system call
#zerocopytransfer
broker consumer
Stateless Broker
• Consumer maintains its own state hence reducing
complexity and overhead on the broker.
• Disadvantage is that broker doesn’t know whether all
subscriber’s have consumed the message , this problem
is solved by using a simple time based SLA for the
retention policy. A message is automatically deleted if it
has been
• A consumer can deliberately rewind back to an old offset
and re-consume data. #violateslogicofqueue.
Distributed Model
Producer 1 Producer 2
Broker 1 Broker 2 Broker 3
Consumer 1 Consumer 2
Zookeeper
Distributed Model contd..
• Consumer Group consists of one or more consumers.
• All messages from a partition are consumed by a single
consumer in a consumer group. #lessoverheadonbroker.
• No master node  less complexity and no master
failures to worry about.
• Zookeeper co-ordinates between the producers,
consumers and brokers.
Zookeeper functions.
• Detection of addition/ removal of brokers, producers, consumers.
• Triggering rebalance process in effect to above detection.
• Tracking.
Broker registry hostname, port and
set of topics of broker.
ephemeral
Consumer Registry consumer group Ephemeral
Owner registry consumer currently
consuming a partition
persistent
Offset registry stores the offset of last
consumed partition for
each subscribed
partition.
persistent
Delivery Guarantee
• At least once delivery. #cancauseduplication.#cost-
effective
• Messages from single partition in order.
• Messages from multiple partitions not necessarily in
order. #noguarantee
• CRC for each message in the logs #avoidlogcorruption
• If I/O error on broker then kafka removes messages with
inconsistent CRCs.
• If the storage system is completely damaged and
consumer have not consumed the message then the
message is lost forever. #futureplanstoaddreplication.
Recent Developments
• End-to-End Batch Level Compression.
• Improved Stream Processing Libraries.
• Hadoop Consumer.
• Hadoop Producer.
Apache Kafka @
References.
• http://incubator.apache.org/kafka/
• http://research.microsoft.com/en-us/um/people/srikan
• http://vimeo.com/27592622

More Related Content

What's hot

[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions
[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions
[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and SolutionsWSO2
 
Azure Application insights - An Introduction
Azure Application insights - An IntroductionAzure Application insights - An Introduction
Azure Application insights - An IntroductionMatthias Güntert
 
Real User Monitoring (RUM)
Real User Monitoring (RUM)Real User Monitoring (RUM)
Real User Monitoring (RUM)Site24x7
 
Site24x7 Cloud Monitoring
Site24x7 Cloud MonitoringSite24x7 Cloud Monitoring
Site24x7 Cloud MonitoringSite24x7
 
[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...
[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...
[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...WSO2
 
Modernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoringModernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoringManageEngine, Zoho Corporation
 
Cloud Bursting with A10 Lightning ADS
Cloud Bursting with A10 Lightning ADSCloud Bursting with A10 Lightning ADS
Cloud Bursting with A10 Lightning ADSAkshay Mathur
 
"What database can tell about application issues? What application can tell a...
"What database can tell about application issues? What application can tell a..."What database can tell about application issues? What application can tell a...
"What database can tell about application issues? What application can tell a...Fwdays
 
One Azure Monitor to Rule Them All? - Marius Zaharia
One Azure Monitor to Rule Them All? - Marius ZahariaOne Azure Monitor to Rule Them All? - Marius Zaharia
One Azure Monitor to Rule Them All? - Marius ZahariaITCamp
 
Azure API Manegement Introduction and Integeration with BizTalk
Azure API Manegement Introduction and Integeration with BizTalkAzure API Manegement Introduction and Integeration with BizTalk
Azure API Manegement Introduction and Integeration with BizTalkShailesh Dwivedi
 
Full Stack Reactive In Practice
Full Stack Reactive In PracticeFull Stack Reactive In Practice
Full Stack Reactive In PracticeLightbend
 
Nava SIEM Agent Datasheet
Nava SIEM Agent DatasheetNava SIEM Agent Datasheet
Nava SIEM Agent DatasheetLinkgard
 
Restful Asynchronous Notification
Restful Asynchronous NotificationRestful Asynchronous Notification
Restful Asynchronous NotificationMichael Koster
 
Cloud monitoring
Cloud monitoringCloud monitoring
Cloud monitoringGang Tao
 
Building Lightweight Microservices With Redis & Hydra
Building Lightweight Microservices With Redis & HydraBuilding Lightweight Microservices With Redis & Hydra
Building Lightweight Microservices With Redis & HydraRedis Labs
 
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...Flink Forward
 
Keynote : évolution et vision d'Elastic Observability
Keynote : évolution et vision d'Elastic ObservabilityKeynote : évolution et vision d'Elastic Observability
Keynote : évolution et vision d'Elastic ObservabilityElasticsearch
 

What's hot (20)

[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions
[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions
[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions
 
Blr hadoop meetup
Blr hadoop meetupBlr hadoop meetup
Blr hadoop meetup
 
Monitor Cloud Resources using Alerts & Insights
Monitor Cloud Resources using Alerts & InsightsMonitor Cloud Resources using Alerts & Insights
Monitor Cloud Resources using Alerts & Insights
 
Azure Application insights - An Introduction
Azure Application insights - An IntroductionAzure Application insights - An Introduction
Azure Application insights - An Introduction
 
Real User Monitoring (RUM)
Real User Monitoring (RUM)Real User Monitoring (RUM)
Real User Monitoring (RUM)
 
Site24x7 Cloud Monitoring
Site24x7 Cloud MonitoringSite24x7 Cloud Monitoring
Site24x7 Cloud Monitoring
 
[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...
[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...
[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...
 
Modernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoringModernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoring
 
Cloud Bursting with A10 Lightning ADS
Cloud Bursting with A10 Lightning ADSCloud Bursting with A10 Lightning ADS
Cloud Bursting with A10 Lightning ADS
 
"What database can tell about application issues? What application can tell a...
"What database can tell about application issues? What application can tell a..."What database can tell about application issues? What application can tell a...
"What database can tell about application issues? What application can tell a...
 
One Azure Monitor to Rule Them All? - Marius Zaharia
One Azure Monitor to Rule Them All? - Marius ZahariaOne Azure Monitor to Rule Them All? - Marius Zaharia
One Azure Monitor to Rule Them All? - Marius Zaharia
 
Azure API Manegement Introduction and Integeration with BizTalk
Azure API Manegement Introduction and Integeration with BizTalkAzure API Manegement Introduction and Integeration with BizTalk
Azure API Manegement Introduction and Integeration with BizTalk
 
Full Stack Reactive In Practice
Full Stack Reactive In PracticeFull Stack Reactive In Practice
Full Stack Reactive In Practice
 
implementing the right website monitoring strategy
 implementing the right website monitoring strategy implementing the right website monitoring strategy
implementing the right website monitoring strategy
 
Nava SIEM Agent Datasheet
Nava SIEM Agent DatasheetNava SIEM Agent Datasheet
Nava SIEM Agent Datasheet
 
Restful Asynchronous Notification
Restful Asynchronous NotificationRestful Asynchronous Notification
Restful Asynchronous Notification
 
Cloud monitoring
Cloud monitoringCloud monitoring
Cloud monitoring
 
Building Lightweight Microservices With Redis & Hydra
Building Lightweight Microservices With Redis & HydraBuilding Lightweight Microservices With Redis & Hydra
Building Lightweight Microservices With Redis & Hydra
 
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...
 
Keynote : évolution et vision d'Elastic Observability
Keynote : évolution et vision d'Elastic ObservabilityKeynote : évolution et vision d'Elastic Observability
Keynote : évolution et vision d'Elastic Observability
 

Viewers also liked

3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...Timothy McCormick
 
3425 - Using publish/subscribe to integrate applications
3425 - Using publish/subscribe to integrate applications3425 - Using publish/subscribe to integrate applications
3425 - Using publish/subscribe to integrate applicationsTimothy McCormick
 
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaIntroducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaAndrew Schofield
 
#1922 rest-push2 ap-im-v6
#1922 rest-push2 ap-im-v6#1922 rest-push2 ap-im-v6
#1922 rest-push2 ap-im-v6Jack Carnes
 
HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen...
 HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen... HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen...
HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen...Matt Leming
 
3429 How to transform your messaging environment to a secure messaging envi...
3429   How to transform your messaging environment to a secure messaging envi...3429   How to transform your messaging environment to a secure messaging envi...
3429 How to transform your messaging environment to a secure messaging envi...Robert Parker
 
WhatsNewIBMIntegrationBus10FP4
WhatsNewIBMIntegrationBus10FP4WhatsNewIBMIntegrationBus10FP4
WhatsNewIBMIntegrationBus10FP4bthomps1979
 
ConnectorsForIntegration
ConnectorsForIntegrationConnectorsForIntegration
ConnectorsForIntegrationbthomps1979
 
IBM Messaging Security - Why securing your environment is important : IBM Int...
IBM Messaging Security - Why securing your environment is important : IBM Int...IBM Messaging Security - Why securing your environment is important : IBM Int...
IBM Messaging Security - Why securing your environment is important : IBM Int...Leif Davidsen
 
Hia 1691-using iib-to_support_api_economy
Hia 1691-using iib-to_support_api_economyHia 1691-using iib-to_support_api_economy
Hia 1691-using iib-to_support_api_economyAndrew Coleman
 
Hia 1689-techinical introduction-to_iib
Hia 1689-techinical introduction-to_iibHia 1689-techinical introduction-to_iib
Hia 1689-techinical introduction-to_iibAndrew Coleman
 
Java zone 2015 How to make life with kafka easier.
Java zone 2015 How to make life with kafka easier.Java zone 2015 How to make life with kafka easier.
Java zone 2015 How to make life with kafka easier.Krzysztof Debski
 
Microservices: Where do they fit within a rapidly evolving integration archit...
Microservices: Where do they fit within a rapidly evolving integration archit...Microservices: Where do they fit within a rapidly evolving integration archit...
Microservices: Where do they fit within a rapidly evolving integration archit...Kim Clark
 

Viewers also liked (13)

3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...
 
3425 - Using publish/subscribe to integrate applications
3425 - Using publish/subscribe to integrate applications3425 - Using publish/subscribe to integrate applications
3425 - Using publish/subscribe to integrate applications
 
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaIntroducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
 
#1922 rest-push2 ap-im-v6
#1922 rest-push2 ap-im-v6#1922 rest-push2 ap-im-v6
#1922 rest-push2 ap-im-v6
 
HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen...
 HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen... HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen...
HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen...
 
3429 How to transform your messaging environment to a secure messaging envi...
3429   How to transform your messaging environment to a secure messaging envi...3429   How to transform your messaging environment to a secure messaging envi...
3429 How to transform your messaging environment to a secure messaging envi...
 
WhatsNewIBMIntegrationBus10FP4
WhatsNewIBMIntegrationBus10FP4WhatsNewIBMIntegrationBus10FP4
WhatsNewIBMIntegrationBus10FP4
 
ConnectorsForIntegration
ConnectorsForIntegrationConnectorsForIntegration
ConnectorsForIntegration
 
IBM Messaging Security - Why securing your environment is important : IBM Int...
IBM Messaging Security - Why securing your environment is important : IBM Int...IBM Messaging Security - Why securing your environment is important : IBM Int...
IBM Messaging Security - Why securing your environment is important : IBM Int...
 
Hia 1691-using iib-to_support_api_economy
Hia 1691-using iib-to_support_api_economyHia 1691-using iib-to_support_api_economy
Hia 1691-using iib-to_support_api_economy
 
Hia 1689-techinical introduction-to_iib
Hia 1689-techinical introduction-to_iibHia 1689-techinical introduction-to_iib
Hia 1689-techinical introduction-to_iib
 
Java zone 2015 How to make life with kafka easier.
Java zone 2015 How to make life with kafka easier.Java zone 2015 How to make life with kafka easier.
Java zone 2015 How to make life with kafka easier.
 
Microservices: Where do they fit within a rapidly evolving integration archit...
Microservices: Where do they fit within a rapidly evolving integration archit...Microservices: Where do they fit within a rapidly evolving integration archit...
Microservices: Where do they fit within a rapidly evolving integration archit...
 

Similar to Apache kafka- Onkar Kadam

Distributed messaging with Apache Kafka
Distributed messaging with Apache KafkaDistributed messaging with Apache Kafka
Distributed messaging with Apache KafkaSaumitra Srivastav
 
Removing dependencies between services: Messaging and Apache Kafka
Removing dependencies between services: Messaging and Apache KafkaRemoving dependencies between services: Messaging and Apache Kafka
Removing dependencies between services: Messaging and Apache KafkaDaniel Muñoz Garrido
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scalejimriecken
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdfTarekHamdi8
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka IntroductionAmita Mirajkar
 
kafka simplicity and complexity
kafka simplicity and complexitykafka simplicity and complexity
kafka simplicity and complexityPaolo Platter
 
Zookeeper Tutorial for beginners
Zookeeper Tutorial for beginnersZookeeper Tutorial for beginners
Zookeeper Tutorial for beginnersjeetendra mandal
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overviewiamtodor
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to MicroservicesMahmoudZidan41
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
 
OnPrem Monitoring.pdf
OnPrem Monitoring.pdfOnPrem Monitoring.pdf
OnPrem Monitoring.pdfTarekHamdi8
 
Unleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptxUnleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptxKnoldus Inc.
 
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...QCloudMentor
 
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)Ontico
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
WSO2 Message Broker - Product Overview
WSO2 Message Broker - Product OverviewWSO2 Message Broker - Product Overview
WSO2 Message Broker - Product OverviewWSO2
 
Microservice message routing on Kubernetes
Microservice message routing on KubernetesMicroservice message routing on Kubernetes
Microservice message routing on KubernetesFrans van Buul
 

Similar to Apache kafka- Onkar Kadam (20)

Distributed messaging with Apache Kafka
Distributed messaging with Apache KafkaDistributed messaging with Apache Kafka
Distributed messaging with Apache Kafka
 
Removing dependencies between services: Messaging and Apache Kafka
Removing dependencies between services: Messaging and Apache KafkaRemoving dependencies between services: Messaging and Apache Kafka
Removing dependencies between services: Messaging and Apache Kafka
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scale
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka tutorial
Kafka tutorialKafka tutorial
Kafka tutorial
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdf
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
kafka simplicity and complexity
kafka simplicity and complexitykafka simplicity and complexity
kafka simplicity and complexity
 
Zookeeper Tutorial for beginners
Zookeeper Tutorial for beginnersZookeeper Tutorial for beginners
Zookeeper Tutorial for beginners
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overview
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
OnPrem Monitoring.pdf
OnPrem Monitoring.pdfOnPrem Monitoring.pdf
OnPrem Monitoring.pdf
 
Unleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptxUnleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptx
 
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...
 
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
WSO2 Message Broker - Product Overview
WSO2 Message Broker - Product OverviewWSO2 Message Broker - Product Overview
WSO2 Message Broker - Product Overview
 
Microservices deck
Microservices deckMicroservices deck
Microservices deck
 
Microservice message routing on Kubernetes
Microservice message routing on KubernetesMicroservice message routing on Kubernetes
Microservice message routing on Kubernetes
 

Recently uploaded

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

Recently uploaded (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

Apache kafka- Onkar Kadam

  • 2. WHAT IS KAFKA? • KAFKA is a distributed messaging system that was originally developed at LinkedIn to serve as the foundation for LinkedIn's activity stream and operational data processing pipeline. It is now used at a variety of different companies for various data pipeline and messaging uses. • LinkedIn developed Kafka for collecting and delivering high volumes of log data with low latency for real-time log processing. • Scala and Zookeeper based.
  • 3. WHAT IS KAFKA? CONTD.. • KAFKA can be used for both ONLINE(realtime) and OFFLINE (integration with hadoop) analysis of log data. • Distributed and scalable . • High Throughput. • More Priority to efficiency then fancy features. • Simple API. • Low Overhead.
  • 5. Where to use APACHE KAFKA? • user activity events corresponding to logins, page-views, clicks, “likes” ,sharing, comments, and search queries • operational metrics (health monitoring) • search relevance. • Recommendations. • Ad targeting and reporting. • Protection against abnormal behavior. • newsfeed • Etc.
  • 6. FLAWS in Traditional Messaging Systems. • More focus on delivery guarantee. • Less focus on throughput. • Weak in distributed support. • Performance degrades as Queue increases.
  • 9. ARCHITECTURE contd.. • A stream of messages of particular type is defined by a topic. • A topic is then divided into multiple partitions • Each partition is a directory that has list of files and the files store the message data. • A producer can publish messages to a topic (push). • Published messages are then stored on a set of servers called brokers. • Each broker stores one or more partitions • Consumers can subscribe to any of the topics and can consume messages by pulling data from the brokers. • Supports both Point-to-point Delivery model and Publish/subscribe Delivery model. • Uses “Pull Based” system model.
  • 10. WHY “PULL MODEL”? • A “push-based” system has difficulty dealing with diverse consumers as the broker controls the rate at which data is transferred, (DOS attack = loss of data). • In a Pull based system, a consumer simply falls behind and catches up when it can. • Consumer can rewind back to consume old messages.
  • 12.
  • 13.
  • 14. Storage • Simple Storage • One each partition corresponds to a logical log.(a log == a list of files.) • Physically a log is implemented using segmentation where each segment is approximately of equal size. • When a message is published , the broker simply appends the message to the last segment file. • Messages are flushed on to disk in a batch for better performance. • Messages is only exposed to consumers after they are flushed. • Messages are addressed by log offset. #lessoverhead.
  • 15. Storage contd.. • Id of next message = length of current message + id of current message.
  • 16. Efficient Transfer • Producer can submit multiple messages in a single send request.( #End-to-endmessagebatching). • Each pull request can consume multiple messages up to a certain size. • No caching of messages on the Kafta process layer. Messages are only cached in the page cache, --“no double buffering”. Hence very little overhead in garbage collecting its memory.(#filesystemcaching)#lessoverhead • Producer and Consumer access the data files Sequentially. (disks are fast when accessed sequentially).
  • 17. Efficient Transfer :from local file to remote socket 1. read data from the storage media to the page cache in an OS. 2. copy data in the page cache to an application buffer. 3. copy application buffer to another kernel buffer. 4. send the kernel buffer to the socket • This process usually takes 4 data copying and one system call. SENDFILE API avoids 2 copies call and 1 system call #zerocopytransfer broker consumer
  • 18. Stateless Broker • Consumer maintains its own state hence reducing complexity and overhead on the broker. • Disadvantage is that broker doesn’t know whether all subscriber’s have consumed the message , this problem is solved by using a simple time based SLA for the retention policy. A message is automatically deleted if it has been • A consumer can deliberately rewind back to an old offset and re-consume data. #violateslogicofqueue.
  • 19. Distributed Model Producer 1 Producer 2 Broker 1 Broker 2 Broker 3 Consumer 1 Consumer 2 Zookeeper
  • 20. Distributed Model contd.. • Consumer Group consists of one or more consumers. • All messages from a partition are consumed by a single consumer in a consumer group. #lessoverheadonbroker. • No master node  less complexity and no master failures to worry about. • Zookeeper co-ordinates between the producers, consumers and brokers.
  • 21. Zookeeper functions. • Detection of addition/ removal of brokers, producers, consumers. • Triggering rebalance process in effect to above detection. • Tracking. Broker registry hostname, port and set of topics of broker. ephemeral Consumer Registry consumer group Ephemeral Owner registry consumer currently consuming a partition persistent Offset registry stores the offset of last consumed partition for each subscribed partition. persistent
  • 22. Delivery Guarantee • At least once delivery. #cancauseduplication.#cost- effective • Messages from single partition in order. • Messages from multiple partitions not necessarily in order. #noguarantee • CRC for each message in the logs #avoidlogcorruption • If I/O error on broker then kafka removes messages with inconsistent CRCs. • If the storage system is completely damaged and consumer have not consumed the message then the message is lost forever. #futureplanstoaddreplication.
  • 23. Recent Developments • End-to-End Batch Level Compression. • Improved Stream Processing Libraries. • Hadoop Consumer. • Hadoop Producer.