SlideShare une entreprise Scribd logo
1  sur  33
Apache Kafka 
The current and future 
http://kafka.apache.org/
Joe Stein 
• Developer, Architect & Technologist 
• Founder & Principal Consultant => Big Data Open Source Security LLC - http://stealth.ly 
Big Data Open Source Security LLC provides professional services and product solutions for the collection, 
storage, transfer, real-time analytics, batch processing and reporting for complex data streams, data sets and 
distributed systems. BDOSS is all about the "glue" and helping companies to not only figure out what Big Data 
Infrastructure Components to use but also how to change their existing (or build new) systems to work with 
them. 
• Apache Kafka Committer & PMC member 
• Blog & Podcast - http://allthingshadoop.com 
• Twitter @allthingshadoop
Apache Kafka 
• Apache Kafka 
o http://kafka.apache.org 
• Apache Kafka Source Code 
o https://github.com/apache/kafka 
• Documentation 
o http://kafka.apache.org/documentation.html 
• FAQ 
o https://cwiki.apache.org/confluence/display/KAFKA/FAQ 
• Wiki 
o https://cwiki.apache.org/confluence/display/KAFKA/Index
It often starts with just one data pipeline
Reuse of data pipelines for new providers
Reuse of existing providers for new consumers
Eventually the solution becomes the problem
And then it gets worse!
Kafka decouples data-pipelines
Topics & Partitions
A high-throughput distributed messaging system 
rethought as a distributed commit log.
LinkedIn’s Kafka Clusters
Where we were when things started
Replication response times
Consumer Throughput
New (0.8.2-beta) JVM Producer 
1 producer, replication x 3 async 786,980 records/sec 
(75.1 MB/sec) 
1 producer, replication x 3 sync 421,823 records/sec 
(40.2 MB/sec) 
3 producer, replication x 3 async 2,024,032 records/sec 
(193.0 MB/sec) 
End-to-end latency 
2 ms (median) 
3 ms (99th percentile) 
14 ms (99.9th percentile)
Message Size vs Throughput (count)
Message Size vs Throughput (MB)
Recap 
• Producers - ** push ** 
o Batching 
o Compression 
o Sync (Ack), Async (auto batch) 
o Replication for durability and fault tolerance 
o Sequential writes, guaranteed ordering within each partition 
• Consumers - ** pull ** 
o No state held by broker 
o Consumers control reading from the stream 
• Zero Copy for producers and consumers to and from the broker 
http://kafka.apache.org/documentation.html#maximizingefficiency 
• Message stay on disk when consumed, deletes on TTL or compaction 
https://kafka.apache.org/documentation.html#compaction
Client Libraries 
● JVM Client supported by the Apache Project https://kafka.apache.org/documentation.html#api 
● Community Clients https://cwiki.apache.org/confluence/display/KAFKA/Clients 
• Python - Pure Python implementation with full protocol support. Consumer and Producer 
implementations included, GZIP and Snappy compression supported. 
• C - High performance C library with full protocol support 
• C++ - Native C++ library with protocol support for Metadata, Produce, Fetch, and Offset. 
• Go (aka golang) Pure Go implementation with full protocol support. Consumer and Producer 
implementations included, GZIP and Snappy compression supported. 
• Ruby - Pure Ruby, Consumer and Producer implementations included, GZIP and Snappy 
compression supported. Ruby 1.9.3 and up (CI runs MRI 2. 
• Clojure - Clojure DSL for the Kafka API 
• JavaScript (NodeJS) - NodeJS client in a pure JavaScript implementation 
• stdin & stdout 
Wire Protocol Developers Guide 
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
● LinkedIn - Apache Kafka is used at LinkedIn for activity 
stream data and operational metrics. This powers various 
products like LinkedIn Newsfeed, LinkedIn Today in addition 
to our offline analytics systems like Hadoop. 
● Twitter - As part of their Storm stream processing 
infrastructure, e.g. this. 
● Netflix - Real-time monitoring and event-processing pipeline. 
● Square - We use Kafka as a bus to move all systems events 
through our various datacenters. This includes metrics, logs, 
custom events etc. On the consumer side, we output into 
Splunk, Graphite, Esper-like real-time alerting. 
● Spotify - Kafka is used at Spotify as part of their log delivery 
system. 
● Pinterest - Kafka is used with Secor as part of their log 
collection pipeline. 
● Uber 
● Tumblr - See this 
● Box - At Box, Kafka is used for the production analytics 
pipeline & real time monitoring infrastructure. We are 
planning to use Kafka for some of the new products & 
● Mozilla - Kafka will soon be replacing part of our current 
production system to collect performance and usage data 
from the end-users browser for projects like Telemetry, Test 
Pilot, etc. Downstream consumers usually persist to either 
HDFS or HBase. 
● Tagged - Apache Kafka drives our new pub sub system 
which delivers real-time events for users in our latest game - 
Deckadence. It will soon be used in a host of new use cases 
including group chat and back end stats and log collection. 
● Foursquare - Kafka powers online to online messaging, and 
online to offline messaging at Foursquare. We integrate with 
monitoring, production systems, and our offline infrastructure, 
including hadoop. 
● StumbleUpon - Data collection platform for analytics. 
● Coursera - At Coursera, Kafka powers education at scale, 
serving as the data pipeline for realtime learning 
analytics/dashboards. 
● Shopify - Access logs, A/B testing events, domain events ("a 
checkout happened", etc.), metrics, delivery to HDFS, and 
customer reporting. We are now focusing on consumers: 
analytics, support tools, and fraud analysis.
● Mate1.com Inc. - Apache kafka is used at Mate1 as our main 
event bus that powers our news and activity feeds, 
automated review systems, and will soon power real time 
notifications and log distribution. 
● Boundary - Apache Kafka aggregates high-flow message 
streams into a unified distributed pubsub service, brokering 
the data for other internal systems as part of Boundary's real-time 
network analytics infrastructure. 
● Ancestry.com - Kafka is used as the event log processing 
pipeline for delivering better personalized product and 
service to our customers. 
● DataSift - Apache Kafka is used at DataSift as a collector of 
monitoring events and to track user's consumption of data 
streams in real 
time.http://highscalability.com/blog/2011/11/29/datasift-architecture- 
realtime-datamining-at-120000-tweets-p.html 
● Spongecell - We use Kafka to run our entire analytics and 
monitoring pipeline driving both real-time and ETL 
applications for our customers. 
● Wooga - We use Kafka to aggregate and process tracking 
data from all our facebook games (which are hosted at 
● AddThis - Apache Kafka is used at AddThis to collect events 
generated by our data network and broker that data to our 
analytics clusters and real-time web analytics platform. 
● Urban Airship - At Urban Airship we use Kafka to buffer 
incoming data points from mobile devices for processing by 
our analytics infrastructure. 
● Metamarkets - We use Kafka to ingest real-time event data, 
stream it to Storm and Hadoop, and then serve it from our 
Druid cluster to feed our interactive analytics dashboards. 
We've also built connectors for directly ingesting events from 
Kafka into Druid. 
● Simple - We use Kafka at Simple for log aggregation and to 
power our analytics infrastructure. 
● Gnip - Kafka is used in their twitter ingestion and processing 
pipeline. 
● Loggly - Loggly is the world's most popular cloud-based log 
management. Our cloud-based log management service 
helps DevOps and technical teams make sense of the the 
massive quantity of logs. Kafka is used as part of our log 
collection and processing infrastructure.
● RichRelevance - Real-time tracking event pipeline. 
● SocialTwist - We use Kafka internally as part of our reliable 
email queueing system. 
● Countandra - We use a hierarchical distributed counting 
engine, uses Kafka as a primary speedy interface as well as 
routing events for cascading counting 
● FlyHajj.com - We use Kafka to collect all metrics and events 
generated by the users of the website. 
● uSwitch - See this blog. 
● InfoChimps - Kafka is part of the InfoChimps real-time data 
platform. 
● Visual Revenue - We use Kafka as a distributed queue in 
front of our web traffic stream processing infrastructure 
(Storm). 
● Oolya - Kafka is used as the primary high speed message 
queue to power Storm and our real-time analytics/event 
ingestion pipelines. 
● Datadog - Kafka brokers data to most systems in our metrics 
and events ingestion pipeline. Different modules contribute 
and consume data from it, for streaming CEP (homegrown), 
● VisualDNA We use Kafka 1. as an infrastructure that helps us 
bring continuously the tracking events from various 
datacenters into our central hadoop cluster for offline 
processing, 2. as a propagation path for data integration, 3. 
as a real-time platform for future inference and 
recommendation engines 
● Sematext - in SPM (performance monitoring + alerting), 
Kafka is used for metrics collection and feeds SPM's in-memory 
data aggregation (OLAP cube creation) as well as 
our CEP/Alerts servers (see also: SPM for Kafka 
performance monitoring). In SA (search analytics) Kafka is 
used in search and click stream collection before being 
aggregated and persisted. In Logsene (log analytics) Kafka is 
used to pass logs and other events from front-end receivers 
to the persistent backend. 
● Wize Commerce - At Wize Commerce (previously, NexTag), 
Kafka is used as a distributed queue in front of Storm based 
processing for search index generation. We plan to also use 
it for collecting user generated data on our web tier, landing 
the data into various data sinks like Hadoop, HBase, etc. 
● Quixey - At Quixey, The Search Engine for Apps, Kafka is an 
integral part of our eventing, logging and messaging
● LinkSmart - Kafka is used at LinkSmart as an event stream 
feeding Hadoop and custom real time systems. 
● LucidWorks Big Data - We use Kafka for syncing LucidWorks 
Search (Solr) with incoming data from Hadoop and also for 
sending LucidWorks Search logs back to Hadoop for analysis. 
● Cloud Physics - Kafka is powering our high-flow event 
pipeline that aggregates over 1.2 billion metric series from 
1000+ data centers for near-to-real time data center 
operational analytics and modeling 
● Graylog2 - Graylog2 is a free and open source log 
management and data analysis system. It's using Kafka as 
default transport for Graylog2 Radio. The use case is 
described here. 
● Yieldbot - Yieldbot uses kafka for real-time events, camus for 
batch loading, and mirrormakers for x-region replication. 
● LivePerson - Using Kafka as the main data bus for all real 
time events. 
● Retention Science - Click stream ingestion and processing. 
● Strava - Powers our analytics pipeline, activity feeds denorm 
and several other production services. 
● Outbrain - We use Kafka in production for real time log collection and 
processing, and for cross-DC cache propagation. 
● SwiftKey - We use Apache Kafka for analytics event processing. 
● Yeller - Yeller uses Kafka to process large streams of incoming exception 
data for it's customers. Rate limiting, throttling and batching are all built on 
top of Kafka. 
● Emerging Threats - Emerging threats uses Kafka in our event pipeline to 
process billions of malware events for search indices, alerting systems, etc. 
● Hotels.com - Hotels.com uses Kafka as pipeline to collect real time events 
from multiple sources and for sending data to HDFS. 
● Helprace - Kafka is used as a distributed high speed message queue in our 
help desk software as well as our real-time event data aggregation and 
analytics. 
● Exponential is using Kafka in production to power the events ingestion 
pipeline for real time analytics and log feed consumption. 
● Livefyre - uses Kafka for the real time notifications, analytics pipeline and as 
the primary mechanism for general pub/sub. 
● Exoscale - uses Kafka in production. 
● Cityzen Data - uses Kafka as well, we provide a platform for collecting, 
storing and analyzing machine data. 
● Criteo - use Kafka in production for over a year for stream processing and 
log transfer (over 2M messages/s and growing) 
● The Wikimedia Foundation - uses Kafka as a log transport for analytics data
Where are we going? (0.8.2) 
● 0.8.2-beta is released https://kafka.apache.org/downloads.html 
○ Better consistency vs. availability (min.isr per topic) 
○ LZ4 Compression 
○ Offset Storage away from Zookeeper 
○ Better Topic Management 
○ New Java Producer 
○ Over 100 bug fixes 
○ More! 
https://archive.apache.org/dist/kafka/0.8.2-beta/RELEASE_NOTES.html
Where are we going? (0.8.2)
Where are we going? (0.8.2)
Where are we going? (0.8.2)
Where are we going? (0.9.0) 
● New Consumer 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design 
● Continued operational improvements 
o Command Line Interface / Admin API 
o https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Command+Line+and+Relate 
d+Improvements 
● Security https://cwiki.apache.org/confluence/display/KAFKA/Security 
o Authentication 
 TLS/SSL 
 Kerberos 
o Authorization 
 Plugable 
● Idempotence, Transactions 
https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka
Really Quick Start (Scala) 
1) Install Vagrant http://www.vagrantup.com/ 
2) Install Virtual Box https://www.virtualbox.org/ 
3) git clone https://github.com/stealthly/scala-kafka 
4) cd scala-kafka 
5) vagrant up 
Zookeeper will be running on 192.168.86.5 
BrokerOne will be running on 192.168.86.10 
All the tests in ./src/test/scala/* should pass (which is also /vagrant/src/test/scala/* in the vm) 
6) ./gradlew test
Really Quick Start (Go) 
1) Install Vagrant http://www.vagrantup.com/ 
2) Install Virtual Box https://www.virtualbox.org/ 
3) git clone https://github.com/stealthly/go-kafka 
4) cd go-kafka 
5) vagrant up 
6) vagrant ssh brokerOne 
7) cd /vagrant 
8) sudo ./test.sh
Questions? 
/******************************************* 
Joe Stein 
Founder, Principal Consultant 
Big Data Open Source Security LLC 
http://www.stealth.ly 
Twitter: @allthingshadoop 
********************************************/

Contenu connexe

Tendances

Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Knoldus Inc.
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-CamusDeep Shah
 
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
Kafka meetup JP #3 - Engineering Apache Kafka at LINEKafka meetup JP #3 - Engineering Apache Kafka at LINE
Kafka meetup JP #3 - Engineering Apache Kafka at LINEkawamuray
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin PodvalMartin Podval
 
Kafka clients and emitters
Kafka clients and emittersKafka clients and emitters
Kafka clients and emittersEdgar Domingues
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseWill Gardella
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streamingdatamantra
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planningconfluent
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...DataWorks Summit/Hadoop Summit
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Data Con LA
 

Tendances (20)

kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
kafka
kafkakafka
kafka
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
kafka
kafkakafka
kafka
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-Camus
 
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
Kafka meetup JP #3 - Engineering Apache Kafka at LINEKafka meetup JP #3 - Engineering Apache Kafka at LINE
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Kafka clients and emitters
Kafka clients and emittersKafka clients and emitters
Kafka clients and emitters
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka - Overview
Apache Kafka - OverviewApache Kafka - Overview
Apache Kafka - Overview
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka
KafkaKafka
Kafka
 

En vedette

Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaJoe Stein
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
 
LogStash - Yes, logging can be awesome
LogStash - Yes, logging can be awesomeLogStash - Yes, logging can be awesome
LogStash - Yes, logging can be awesomeJames Turnbull
 
Down and dirty with Elasticsearch
Down and dirty with ElasticsearchDown and dirty with Elasticsearch
Down and dirty with Elasticsearchclintongormley
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache MesosJoe Stein
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Michael Noll
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaJoe Stein
 
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosGet started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosJoe Stein
 
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Michael Noll
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignMichael Noll
 
Kafka for begginer
Kafka for begginerKafka for begginer
Kafka for begginerYousun Jeong
 
jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011Joe Stein
 
Storing Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite ColumnsStoring Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite ColumnsJoe Stein
 

En vedette (20)

Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
LogStash - Yes, logging can be awesome
LogStash - Yes, logging can be awesomeLogStash - Yes, logging can be awesome
LogStash - Yes, logging can be awesome
 
Down and dirty with Elasticsearch
Down and dirty with ElasticsearchDown and dirty with Elasticsearch
Down and dirty with Elasticsearch
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
 
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosGet started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache Mesos
 
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
reveal.js 3.0.0
reveal.js 3.0.0reveal.js 3.0.0
reveal.js 3.0.0
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Kafka for begginer
Kafka for begginerKafka for begginer
Kafka for begginer
 
Linux training in chandigarh
Linux training in chandigarhLinux training in chandigarh
Linux training in chandigarh
 
Data Pipeline with Kafka
Data Pipeline with KafkaData Pipeline with Kafka
Data Pipeline with Kafka
 
jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011
 
Storing Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite ColumnsStoring Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite Columns
 

Similaire à Current and Future of Apache Kafka

Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureTimothy Spann
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsTimothy Spann
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Timothy Spann
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaSlim Baltagi
 
Apache frameworks for Big and Fast Data
Apache frameworks for Big and Fast DataApache frameworks for Big and Fast Data
Apache frameworks for Big and Fast DataNaveen Korakoppa
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Data Con LA
 
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...Timothy Spann
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaAttunity
 
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing HubIMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing HubIn-Memory Computing Summit
 
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...Kai Wähner
 
Python Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuidePython Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuideInexture Solutions
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
Fault Tolerance with Kafka
Fault Tolerance with KafkaFault Tolerance with Kafka
Fault Tolerance with KafkaEdureka!
 
IoT and Event Streaming at Scale with Apache Kafka
IoT and Event Streaming at Scale with Apache KafkaIoT and Event Streaming at Scale with Apache Kafka
IoT and Event Streaming at Scale with Apache Kafkaconfluent
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis
 

Similaire à Current and Future of Apache Kafka (20)

Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
Apache frameworks for Big and Fast Data
Apache frameworks for Big and Fast DataApache frameworks for Big and Fast Data
Apache frameworks for Big and Fast Data
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
 
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
 
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing HubIMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
 
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
 
Python Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuidePython Kafka Integration: Developers Guide
Python Kafka Integration: Developers Guide
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Fault Tolerance with Kafka
Fault Tolerance with KafkaFault Tolerance with Kafka
Fault Tolerance with Kafka
 
IoT and Event Streaming at Scale with Apache Kafka
IoT and Event Streaming at Scale with Apache KafkaIoT and Event Streaming at Scale with Apache Kafka
IoT and Event Streaming at Scale with Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 

Plus de Joe Stein

Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogJoe Stein
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1Joe Stein
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraJoe Stein
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaJoe Stein
 
Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache MesosJoe Stein
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Joe Stein
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on MesosJoe Stein
 
Making Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosMaking Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosJoe Stein
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloJoe Stein
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosJoe Stein
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosJoe Stein
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaJoe Stein
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache MesosJoe Stein
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0Joe Stein
 
Hadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonHadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonJoe Stein
 

Plus de Joe Stein (15)

Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache Mesos
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on Mesos
 
Making Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosMaking Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache Mesos
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache Mesos
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on Mesos
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache Kafka
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache Mesos
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
Hadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonHadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With Python
 

Dernier

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

Dernier (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

Current and Future of Apache Kafka

  • 1. Apache Kafka The current and future http://kafka.apache.org/
  • 2. Joe Stein • Developer, Architect & Technologist • Founder & Principal Consultant => Big Data Open Source Security LLC - http://stealth.ly Big Data Open Source Security LLC provides professional services and product solutions for the collection, storage, transfer, real-time analytics, batch processing and reporting for complex data streams, data sets and distributed systems. BDOSS is all about the "glue" and helping companies to not only figure out what Big Data Infrastructure Components to use but also how to change their existing (or build new) systems to work with them. • Apache Kafka Committer & PMC member • Blog & Podcast - http://allthingshadoop.com • Twitter @allthingshadoop
  • 3. Apache Kafka • Apache Kafka o http://kafka.apache.org • Apache Kafka Source Code o https://github.com/apache/kafka • Documentation o http://kafka.apache.org/documentation.html • FAQ o https://cwiki.apache.org/confluence/display/KAFKA/FAQ • Wiki o https://cwiki.apache.org/confluence/display/KAFKA/Index
  • 4. It often starts with just one data pipeline
  • 5. Reuse of data pipelines for new providers
  • 6. Reuse of existing providers for new consumers
  • 7. Eventually the solution becomes the problem
  • 8. And then it gets worse!
  • 11. A high-throughput distributed messaging system rethought as a distributed commit log.
  • 12.
  • 14. Where we were when things started
  • 17. New (0.8.2-beta) JVM Producer 1 producer, replication x 3 async 786,980 records/sec (75.1 MB/sec) 1 producer, replication x 3 sync 421,823 records/sec (40.2 MB/sec) 3 producer, replication x 3 async 2,024,032 records/sec (193.0 MB/sec) End-to-end latency 2 ms (median) 3 ms (99th percentile) 14 ms (99.9th percentile)
  • 18. Message Size vs Throughput (count)
  • 19. Message Size vs Throughput (MB)
  • 20. Recap • Producers - ** push ** o Batching o Compression o Sync (Ack), Async (auto batch) o Replication for durability and fault tolerance o Sequential writes, guaranteed ordering within each partition • Consumers - ** pull ** o No state held by broker o Consumers control reading from the stream • Zero Copy for producers and consumers to and from the broker http://kafka.apache.org/documentation.html#maximizingefficiency • Message stay on disk when consumed, deletes on TTL or compaction https://kafka.apache.org/documentation.html#compaction
  • 21. Client Libraries ● JVM Client supported by the Apache Project https://kafka.apache.org/documentation.html#api ● Community Clients https://cwiki.apache.org/confluence/display/KAFKA/Clients • Python - Pure Python implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported. • C - High performance C library with full protocol support • C++ - Native C++ library with protocol support for Metadata, Produce, Fetch, and Offset. • Go (aka golang) Pure Go implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported. • Ruby - Pure Ruby, Consumer and Producer implementations included, GZIP and Snappy compression supported. Ruby 1.9.3 and up (CI runs MRI 2. • Clojure - Clojure DSL for the Kafka API • JavaScript (NodeJS) - NodeJS client in a pure JavaScript implementation • stdin & stdout Wire Protocol Developers Guide https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
  • 22. ● LinkedIn - Apache Kafka is used at LinkedIn for activity stream data and operational metrics. This powers various products like LinkedIn Newsfeed, LinkedIn Today in addition to our offline analytics systems like Hadoop. ● Twitter - As part of their Storm stream processing infrastructure, e.g. this. ● Netflix - Real-time monitoring and event-processing pipeline. ● Square - We use Kafka as a bus to move all systems events through our various datacenters. This includes metrics, logs, custom events etc. On the consumer side, we output into Splunk, Graphite, Esper-like real-time alerting. ● Spotify - Kafka is used at Spotify as part of their log delivery system. ● Pinterest - Kafka is used with Secor as part of their log collection pipeline. ● Uber ● Tumblr - See this ● Box - At Box, Kafka is used for the production analytics pipeline & real time monitoring infrastructure. We are planning to use Kafka for some of the new products & ● Mozilla - Kafka will soon be replacing part of our current production system to collect performance and usage data from the end-users browser for projects like Telemetry, Test Pilot, etc. Downstream consumers usually persist to either HDFS or HBase. ● Tagged - Apache Kafka drives our new pub sub system which delivers real-time events for users in our latest game - Deckadence. It will soon be used in a host of new use cases including group chat and back end stats and log collection. ● Foursquare - Kafka powers online to online messaging, and online to offline messaging at Foursquare. We integrate with monitoring, production systems, and our offline infrastructure, including hadoop. ● StumbleUpon - Data collection platform for analytics. ● Coursera - At Coursera, Kafka powers education at scale, serving as the data pipeline for realtime learning analytics/dashboards. ● Shopify - Access logs, A/B testing events, domain events ("a checkout happened", etc.), metrics, delivery to HDFS, and customer reporting. We are now focusing on consumers: analytics, support tools, and fraud analysis.
  • 23. ● Mate1.com Inc. - Apache kafka is used at Mate1 as our main event bus that powers our news and activity feeds, automated review systems, and will soon power real time notifications and log distribution. ● Boundary - Apache Kafka aggregates high-flow message streams into a unified distributed pubsub service, brokering the data for other internal systems as part of Boundary's real-time network analytics infrastructure. ● Ancestry.com - Kafka is used as the event log processing pipeline for delivering better personalized product and service to our customers. ● DataSift - Apache Kafka is used at DataSift as a collector of monitoring events and to track user's consumption of data streams in real time.http://highscalability.com/blog/2011/11/29/datasift-architecture- realtime-datamining-at-120000-tweets-p.html ● Spongecell - We use Kafka to run our entire analytics and monitoring pipeline driving both real-time and ETL applications for our customers. ● Wooga - We use Kafka to aggregate and process tracking data from all our facebook games (which are hosted at ● AddThis - Apache Kafka is used at AddThis to collect events generated by our data network and broker that data to our analytics clusters and real-time web analytics platform. ● Urban Airship - At Urban Airship we use Kafka to buffer incoming data points from mobile devices for processing by our analytics infrastructure. ● Metamarkets - We use Kafka to ingest real-time event data, stream it to Storm and Hadoop, and then serve it from our Druid cluster to feed our interactive analytics dashboards. We've also built connectors for directly ingesting events from Kafka into Druid. ● Simple - We use Kafka at Simple for log aggregation and to power our analytics infrastructure. ● Gnip - Kafka is used in their twitter ingestion and processing pipeline. ● Loggly - Loggly is the world's most popular cloud-based log management. Our cloud-based log management service helps DevOps and technical teams make sense of the the massive quantity of logs. Kafka is used as part of our log collection and processing infrastructure.
  • 24. ● RichRelevance - Real-time tracking event pipeline. ● SocialTwist - We use Kafka internally as part of our reliable email queueing system. ● Countandra - We use a hierarchical distributed counting engine, uses Kafka as a primary speedy interface as well as routing events for cascading counting ● FlyHajj.com - We use Kafka to collect all metrics and events generated by the users of the website. ● uSwitch - See this blog. ● InfoChimps - Kafka is part of the InfoChimps real-time data platform. ● Visual Revenue - We use Kafka as a distributed queue in front of our web traffic stream processing infrastructure (Storm). ● Oolya - Kafka is used as the primary high speed message queue to power Storm and our real-time analytics/event ingestion pipelines. ● Datadog - Kafka brokers data to most systems in our metrics and events ingestion pipeline. Different modules contribute and consume data from it, for streaming CEP (homegrown), ● VisualDNA We use Kafka 1. as an infrastructure that helps us bring continuously the tracking events from various datacenters into our central hadoop cluster for offline processing, 2. as a propagation path for data integration, 3. as a real-time platform for future inference and recommendation engines ● Sematext - in SPM (performance monitoring + alerting), Kafka is used for metrics collection and feeds SPM's in-memory data aggregation (OLAP cube creation) as well as our CEP/Alerts servers (see also: SPM for Kafka performance monitoring). In SA (search analytics) Kafka is used in search and click stream collection before being aggregated and persisted. In Logsene (log analytics) Kafka is used to pass logs and other events from front-end receivers to the persistent backend. ● Wize Commerce - At Wize Commerce (previously, NexTag), Kafka is used as a distributed queue in front of Storm based processing for search index generation. We plan to also use it for collecting user generated data on our web tier, landing the data into various data sinks like Hadoop, HBase, etc. ● Quixey - At Quixey, The Search Engine for Apps, Kafka is an integral part of our eventing, logging and messaging
  • 25. ● LinkSmart - Kafka is used at LinkSmart as an event stream feeding Hadoop and custom real time systems. ● LucidWorks Big Data - We use Kafka for syncing LucidWorks Search (Solr) with incoming data from Hadoop and also for sending LucidWorks Search logs back to Hadoop for analysis. ● Cloud Physics - Kafka is powering our high-flow event pipeline that aggregates over 1.2 billion metric series from 1000+ data centers for near-to-real time data center operational analytics and modeling ● Graylog2 - Graylog2 is a free and open source log management and data analysis system. It's using Kafka as default transport for Graylog2 Radio. The use case is described here. ● Yieldbot - Yieldbot uses kafka for real-time events, camus for batch loading, and mirrormakers for x-region replication. ● LivePerson - Using Kafka as the main data bus for all real time events. ● Retention Science - Click stream ingestion and processing. ● Strava - Powers our analytics pipeline, activity feeds denorm and several other production services. ● Outbrain - We use Kafka in production for real time log collection and processing, and for cross-DC cache propagation. ● SwiftKey - We use Apache Kafka for analytics event processing. ● Yeller - Yeller uses Kafka to process large streams of incoming exception data for it's customers. Rate limiting, throttling and batching are all built on top of Kafka. ● Emerging Threats - Emerging threats uses Kafka in our event pipeline to process billions of malware events for search indices, alerting systems, etc. ● Hotels.com - Hotels.com uses Kafka as pipeline to collect real time events from multiple sources and for sending data to HDFS. ● Helprace - Kafka is used as a distributed high speed message queue in our help desk software as well as our real-time event data aggregation and analytics. ● Exponential is using Kafka in production to power the events ingestion pipeline for real time analytics and log feed consumption. ● Livefyre - uses Kafka for the real time notifications, analytics pipeline and as the primary mechanism for general pub/sub. ● Exoscale - uses Kafka in production. ● Cityzen Data - uses Kafka as well, we provide a platform for collecting, storing and analyzing machine data. ● Criteo - use Kafka in production for over a year for stream processing and log transfer (over 2M messages/s and growing) ● The Wikimedia Foundation - uses Kafka as a log transport for analytics data
  • 26. Where are we going? (0.8.2) ● 0.8.2-beta is released https://kafka.apache.org/downloads.html ○ Better consistency vs. availability (min.isr per topic) ○ LZ4 Compression ○ Offset Storage away from Zookeeper ○ Better Topic Management ○ New Java Producer ○ Over 100 bug fixes ○ More! https://archive.apache.org/dist/kafka/0.8.2-beta/RELEASE_NOTES.html
  • 27. Where are we going? (0.8.2)
  • 28. Where are we going? (0.8.2)
  • 29. Where are we going? (0.8.2)
  • 30. Where are we going? (0.9.0) ● New Consumer https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design ● Continued operational improvements o Command Line Interface / Admin API o https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Command+Line+and+Relate d+Improvements ● Security https://cwiki.apache.org/confluence/display/KAFKA/Security o Authentication  TLS/SSL  Kerberos o Authorization  Plugable ● Idempotence, Transactions https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka
  • 31. Really Quick Start (Scala) 1) Install Vagrant http://www.vagrantup.com/ 2) Install Virtual Box https://www.virtualbox.org/ 3) git clone https://github.com/stealthly/scala-kafka 4) cd scala-kafka 5) vagrant up Zookeeper will be running on 192.168.86.5 BrokerOne will be running on 192.168.86.10 All the tests in ./src/test/scala/* should pass (which is also /vagrant/src/test/scala/* in the vm) 6) ./gradlew test
  • 32. Really Quick Start (Go) 1) Install Vagrant http://www.vagrantup.com/ 2) Install Virtual Box https://www.virtualbox.org/ 3) git clone https://github.com/stealthly/go-kafka 4) cd go-kafka 5) vagrant up 6) vagrant ssh brokerOne 7) cd /vagrant 8) sudo ./test.sh
  • 33. Questions? /******************************************* Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop ********************************************/