SlideShare une entreprise Scribd logo
1  sur  15
Télécharger pour lire hors ligne
Apache Kafka
CHAPTER – 4
THE BASICS OF SEARCH ENGINE FRIENDLY DESIGN & DEVELOPMENT
Copyright @ 2019 Learntek. All Rights Reserved. 3
Apache Kafka
Data Analytics is often described as one of the biggest challenges associated with
big data, but even before that step can happen, data must be ingested and made
available to enterprise users. That’s where Apache Kafka comes in. Kafka’s growth
is exploding, more than 1⁄3 of all Fortune 500 companies use Kafka. These
companies includes the top ten travel companies, 7 of top ten banks, 8 of top ten
insurance companies, 9 of top ten telecom companies, and much more. LinkedIn,
Microsoft and Netflix process four comma messages a day with Kafka
(1,000,000,000,000).
Copyright @ 2019 Learntek. All Rights Reserved. 4
Introduction:
Apache Kafka is a streaming platform for collecting, storing, and processing high
volumes of data in real-time. Apache Kafka is a highly scalable, fast and fault-
tolerant messaging application used for streaming applications and data
processing. This application is written in Java and Scala programming languages.
Apache Kafka is a distributed data streaming platform that can publish, subscribe
to, store, and process streams of records in real time. It is designed to handle
data streams from multiple sources and deliver them to multiple consumers. In
short, it moves massive amounts of data – not just from point A to B, but from
points A to Z and anywhere else you need, all at the same time.
Apache Kafka started out as an internal system developed by LinkedIn to handle
1.4 trillion messages per day, but now it’s an open source data streaming solution
with application for a variety of enterprise needs.
Copyright @ 2019 Learntek. All Rights Reserved. 5
Copyright @ 2019 Learntek. All Rights Reserved. 6
Features:
•Apache Kafka is a distributed publish-subscribe messaging system that is designed to
be fast, scalable, and durable
•Apache Kafka is designed for distributed high throughput systems
•Apache Kafka tends to work very well as a replacement for a more traditional
message broker
•Apache Kafka has better throughput, built-in partitioning, replication and inherent
fault-tolerance, which makes it a good fit for large-scale message processing
applications
•Apache Kafka maintains feeds of messages in topics
•Producers write data to topics and consumers read from topics
•Since Kafka is a distributed system, topics are partitioned and replicated across
multiple nodes
•Kafka is very fast and guarantees zero downtime and zero data loss.
Copyright @ 2019 Learntek. All Rights Reserved. 7
Learn Big Data & Hadoop
Who uses Apache Kafka?
A lot of large companies who handle a lot of data use Kafka. LinkedIn, where it
originated, uses it to track activity data and operational metrics. Twitter uses it as
part of Storm to provide a stream processing infrastructure. Square uses Kafka as a
bus to move all system events to various Square data centers (logs, custom events,
metrics, and so on), outputs to Splunk, Graphite (dashboards), and to implement
an Esper-like/CEP alerting systems. It gets used by other companies too like Spotify,
Uber, Tumbler, Goldman Sachs, PayPal, Box, Cisco, CloudFlare, NetFlix, and much
more.
Copyright @ 2019 Learntek. All Rights Reserved. 8
Why is Kafka so Fast?
Kafka relies heavily on the OS kernel to move data around quickly. It relies on the
principals of Zero Copy. Kafka enables you to batch data records into chunks. These
batches of data can be seen end to end from Producer to file system (Kafka Topic
Log) to the Consumer. Batching allows for more efficient data compression and
reduces I/O latency. Kafka writes to the immutable commit log to the disk
sequential; thus, avoids random disk access, slow disk seeking. Kafka provides
horizontal Scale through sharding. It shards a Topic Log into hundreds potentially
thousands of partitions to thousands of servers. This sharding allows Kafka to
handle massive load.
Copyright @ 2019 Learntek. All Rights Reserved. 9
Key Benefits:
Copyright @ 2019 Learntek. All Rights Reserved. 10
Apache Kafka API:
Apache Kafka is a popular tool for developers because it is easy to pick up and
provides a powerful event streaming platform complete with 4 APIs: Producer,
Consumer, Streams, and Connect.
Basically, it has four core APIs:
•Producer API: This API permits the applications to publish a stream of records to
one or more topics.
•Consumer API: The Consumer API lets the application to subscribe to one or
more topics and process the produced stream of records.
•Streams API: This API takes the input from one or more topics and produces the
output to one or more topics by converting the input streams to the output ones.
•Connector API: This API is responsible for producing and executing reusable
producers and consumers who are able to link topics to the existing applications.
Copyright @ 2019 Learntek. All Rights Reserved. 11
Need for Apache Kafka :
•Kafka is a unified platform for handling all the real-time data feeds
•Kafka supports low latency message delivery and gives guarantee for fault tolerance in
the presence of machine failures
•It has the ability to handle a large number of diverse consumers
•Kafka is very fast, performs 2 million writes/sec
•Kafka persists all data to the disk, which essentially means that all the writes go to the
page cache of the OS (RAM)
•This makes it very efficient to transfer data from page cache to a network socket
Copyright @ 2019 Learntek. All Rights Reserved. 12
Apache Kafka – Use Cases:
Kafka can be used in many Use Cases. Some of them are listed below −
•Metrics− Kafka is often used for operational monitoring data. This involves
aggregating statistics from distributed applications to produce centralized feeds of
operational data.
•Twitter: Registered users can read and post tweets, but unregistered users can
only read tweets. Twitter uses Storm-Kafka as a part of their stream processing
infrastructure.
•Netflix: is an American multinational provider of on-demand Internet streaming
media. Netflix uses Kafka for real-time monitoring and event processing.
Copyright @ 2019 Learntek. All Rights Reserved. 13
•Log Aggregation Solution− Kafka can be used across an organization to collect
logs from multiple services and make them available in a standard format to multiple
con-summers.
•LinkedIn: Apache Kafka is used at LinkedIn for activity stream data and operational
metrics. Kafka messaging system helps LinkedIn with various products like LinkedIn
Newsfeed, LinkedIn Today for online message consumption and in addition to offline
analytics systems like Hadoop.
•Stream Processing− Popular frameworks such as Storm and Spark Streaming read
data from a topic, processes it, and write processed data to a new topic where it
becomes available for users and applications. Kafka’s strong durability is also very
useful in the context of stream processing.
Copyright @ 2019 Learntek. All Rights Reserved. 14
•Website activity tracking – The web application sends events such as page
views and searches Kafka, where they become available for real-time processing,
dashboards and offline analytics in Hadoop.
Copyright @ 2019 Learntek. All Rights Reserved. 15
For more Training Information , Contact Us
Email : info@learntek.org
USA : +1734 418 2465
INDIA : +40 4018 1306
+7799713624

Contenu connexe

Tendances

Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-timeFlurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Trieu Nguyen
 

Tendances (20)

Using Apache Spark with IBM SPSS Modeler
Using Apache Spark with IBM SPSS ModelerUsing Apache Spark with IBM SPSS Modeler
Using Apache Spark with IBM SPSS Modeler
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - Madrid
 
The Many Faces of Apache Kafka: Leveraging real-time data at scale
The Many Faces of Apache Kafka: Leveraging real-time data at scaleThe Many Faces of Apache Kafka: Leveraging real-time data at scale
The Many Faces of Apache Kafka: Leveraging real-time data at scale
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache Fink 1.0: A New Era  for Real-World Streaming AnalyticsApache Fink 1.0: A New Era  for Real-World Streaming Analytics
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
 
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
 
How do spark_kafka_and_syncsort_dmx-h
How do spark_kafka_and_syncsort_dmx-hHow do spark_kafka_and_syncsort_dmx-h
How do spark_kafka_and_syncsort_dmx-h
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
 
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
 
Stream processing and managing real-time data
Stream processing and managing real-time dataStream processing and managing real-time data
Stream processing and managing real-time data
 
Impala use case @ Zoosk
Impala use case @ ZooskImpala use case @ Zoosk
Impala use case @ Zoosk
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stack
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-timeFlurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
 
Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016
 

Similaire à Apache kafka

Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Timothy Spann
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
Timothy Spann
 
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Denodo
 
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann
 
ITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming AppsITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming Apps
Timothy Spann
 

Similaire à Apache kafka (20)

Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
 
Kafka Basic For Beginners
Kafka Basic For BeginnersKafka Basic For Beginners
Kafka Basic For Beginners
 
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
 
A Short Presentation on Kafka
A Short Presentation on KafkaA Short Presentation on Kafka
A Short Presentation on Kafka
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
 
Data streaming
Data streamingData streaming
Data streaming
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
 
Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up
Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up  Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up
Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
 
Apache Kafka Use Cases_ When To Use It_ When Not To Use_.pdf
Apache Kafka Use Cases_ When To Use It_ When Not To Use_.pdfApache Kafka Use Cases_ When To Use It_ When Not To Use_.pdf
Apache Kafka Use Cases_ When To Use It_ When Not To Use_.pdf
 
ITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming AppsITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming Apps
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
kafka-tutorial-cloudruable-v2.pdf
kafka-tutorial-cloudruable-v2.pdfkafka-tutorial-cloudruable-v2.pdf
kafka-tutorial-cloudruable-v2.pdf
 
Apache frameworks for Big and Fast Data
Apache frameworks for Big and Fast DataApache frameworks for Big and Fast Data
Apache frameworks for Big and Fast Data
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
 

Plus de Janu Jahnavi

Plus de Janu Jahnavi (20)

Analytics using r programming
Analytics using r programmingAnalytics using r programming
Analytics using r programming
 
Software testing
Software testingSoftware testing
Software testing
 
Software testing
Software testingSoftware testing
Software testing
 
Spring
SpringSpring
Spring
 
Stack skills
Stack skillsStack skills
Stack skills
 
Ui devopler
Ui devoplerUi devopler
Ui devopler
 
Apache flink
Apache flinkApache flink
Apache flink
 
Apache flink
Apache flinkApache flink
Apache flink
 
Angular js
Angular jsAngular js
Angular js
 
Mysql python
Mysql pythonMysql python
Mysql python
 
Mysql python
Mysql pythonMysql python
Mysql python
 
Ruby with cucmber
Ruby with cucmberRuby with cucmber
Ruby with cucmber
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Google cloud platform
Google cloud platformGoogle cloud platform
Google cloud platform
 
Google cloud Platform
Google cloud PlatformGoogle cloud Platform
Google cloud Platform
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
 
Categorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk pythonCategorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk python
 
Categorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk pythonCategorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk python
 
Python multithreading
Python multithreadingPython multithreading
Python multithreading
 

Dernier

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 

Dernier (20)

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 

Apache kafka

  • 2. CHAPTER – 4 THE BASICS OF SEARCH ENGINE FRIENDLY DESIGN & DEVELOPMENT
  • 3. Copyright @ 2019 Learntek. All Rights Reserved. 3 Apache Kafka Data Analytics is often described as one of the biggest challenges associated with big data, but even before that step can happen, data must be ingested and made available to enterprise users. That’s where Apache Kafka comes in. Kafka’s growth is exploding, more than 1⁄3 of all Fortune 500 companies use Kafka. These companies includes the top ten travel companies, 7 of top ten banks, 8 of top ten insurance companies, 9 of top ten telecom companies, and much more. LinkedIn, Microsoft and Netflix process four comma messages a day with Kafka (1,000,000,000,000).
  • 4. Copyright @ 2019 Learntek. All Rights Reserved. 4 Introduction: Apache Kafka is a streaming platform for collecting, storing, and processing high volumes of data in real-time. Apache Kafka is a highly scalable, fast and fault- tolerant messaging application used for streaming applications and data processing. This application is written in Java and Scala programming languages. Apache Kafka is a distributed data streaming platform that can publish, subscribe to, store, and process streams of records in real time. It is designed to handle data streams from multiple sources and deliver them to multiple consumers. In short, it moves massive amounts of data – not just from point A to B, but from points A to Z and anywhere else you need, all at the same time. Apache Kafka started out as an internal system developed by LinkedIn to handle 1.4 trillion messages per day, but now it’s an open source data streaming solution with application for a variety of enterprise needs.
  • 5. Copyright @ 2019 Learntek. All Rights Reserved. 5
  • 6. Copyright @ 2019 Learntek. All Rights Reserved. 6 Features: •Apache Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable •Apache Kafka is designed for distributed high throughput systems •Apache Kafka tends to work very well as a replacement for a more traditional message broker •Apache Kafka has better throughput, built-in partitioning, replication and inherent fault-tolerance, which makes it a good fit for large-scale message processing applications •Apache Kafka maintains feeds of messages in topics •Producers write data to topics and consumers read from topics •Since Kafka is a distributed system, topics are partitioned and replicated across multiple nodes •Kafka is very fast and guarantees zero downtime and zero data loss.
  • 7. Copyright @ 2019 Learntek. All Rights Reserved. 7 Learn Big Data & Hadoop Who uses Apache Kafka? A lot of large companies who handle a lot of data use Kafka. LinkedIn, where it originated, uses it to track activity data and operational metrics. Twitter uses it as part of Storm to provide a stream processing infrastructure. Square uses Kafka as a bus to move all system events to various Square data centers (logs, custom events, metrics, and so on), outputs to Splunk, Graphite (dashboards), and to implement an Esper-like/CEP alerting systems. It gets used by other companies too like Spotify, Uber, Tumbler, Goldman Sachs, PayPal, Box, Cisco, CloudFlare, NetFlix, and much more.
  • 8. Copyright @ 2019 Learntek. All Rights Reserved. 8 Why is Kafka so Fast? Kafka relies heavily on the OS kernel to move data around quickly. It relies on the principals of Zero Copy. Kafka enables you to batch data records into chunks. These batches of data can be seen end to end from Producer to file system (Kafka Topic Log) to the Consumer. Batching allows for more efficient data compression and reduces I/O latency. Kafka writes to the immutable commit log to the disk sequential; thus, avoids random disk access, slow disk seeking. Kafka provides horizontal Scale through sharding. It shards a Topic Log into hundreds potentially thousands of partitions to thousands of servers. This sharding allows Kafka to handle massive load.
  • 9. Copyright @ 2019 Learntek. All Rights Reserved. 9 Key Benefits:
  • 10. Copyright @ 2019 Learntek. All Rights Reserved. 10 Apache Kafka API: Apache Kafka is a popular tool for developers because it is easy to pick up and provides a powerful event streaming platform complete with 4 APIs: Producer, Consumer, Streams, and Connect. Basically, it has four core APIs: •Producer API: This API permits the applications to publish a stream of records to one or more topics. •Consumer API: The Consumer API lets the application to subscribe to one or more topics and process the produced stream of records. •Streams API: This API takes the input from one or more topics and produces the output to one or more topics by converting the input streams to the output ones. •Connector API: This API is responsible for producing and executing reusable producers and consumers who are able to link topics to the existing applications.
  • 11. Copyright @ 2019 Learntek. All Rights Reserved. 11 Need for Apache Kafka : •Kafka is a unified platform for handling all the real-time data feeds •Kafka supports low latency message delivery and gives guarantee for fault tolerance in the presence of machine failures •It has the ability to handle a large number of diverse consumers •Kafka is very fast, performs 2 million writes/sec •Kafka persists all data to the disk, which essentially means that all the writes go to the page cache of the OS (RAM) •This makes it very efficient to transfer data from page cache to a network socket
  • 12. Copyright @ 2019 Learntek. All Rights Reserved. 12 Apache Kafka – Use Cases: Kafka can be used in many Use Cases. Some of them are listed below − •Metrics− Kafka is often used for operational monitoring data. This involves aggregating statistics from distributed applications to produce centralized feeds of operational data. •Twitter: Registered users can read and post tweets, but unregistered users can only read tweets. Twitter uses Storm-Kafka as a part of their stream processing infrastructure. •Netflix: is an American multinational provider of on-demand Internet streaming media. Netflix uses Kafka for real-time monitoring and event processing.
  • 13. Copyright @ 2019 Learntek. All Rights Reserved. 13 •Log Aggregation Solution− Kafka can be used across an organization to collect logs from multiple services and make them available in a standard format to multiple con-summers. •LinkedIn: Apache Kafka is used at LinkedIn for activity stream data and operational metrics. Kafka messaging system helps LinkedIn with various products like LinkedIn Newsfeed, LinkedIn Today for online message consumption and in addition to offline analytics systems like Hadoop. •Stream Processing− Popular frameworks such as Storm and Spark Streaming read data from a topic, processes it, and write processed data to a new topic where it becomes available for users and applications. Kafka’s strong durability is also very useful in the context of stream processing.
  • 14. Copyright @ 2019 Learntek. All Rights Reserved. 14 •Website activity tracking – The web application sends events such as page views and searches Kafka, where they become available for real-time processing, dashboards and offline analytics in Hadoop.
  • 15. Copyright @ 2019 Learntek. All Rights Reserved. 15 For more Training Information , Contact Us Email : info@learntek.org USA : +1734 418 2465 INDIA : +40 4018 1306 +7799713624