SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
Kafka Evaluation for
Data Pipeline and ETL
Shafaq Abdullah @ GREE
Date: 10/21/2013
High-Level Architecture
Performance
Scalability
Fault-Tolerance/Error Recovery
Operations and Monitoring
Summary
Outline
Kafka -
A distributed messaging pub/sub system
In Production: Linkedin, FourSquare, Twitter, GREE (almost)
Usage:
- Log aggregation (user activity stream)
- Real-time events
- Monitoring
- Queuing
Intro
Point-to-Point Data Pipelines
Game
Server 1
Logs
Hadoop
...
User
Tracking
SecuritySearch
Social
Graph
Rules/
Recomm
endation/
Engine
VerticaOps
DataWare
House
Message- Central Data Channel
Game
Server 1
Logs
Hadoop
...
User
Tracking
SecuritySearch
Social
Graph
Rules/
Recomm
endation/
Engine
VerticaOps
Data
Warehouse
Message Queue
Topic-
A String representing Message Stream Id
Partition-
Logical Division per topic-level within Broker, for
writing logs generated by producer.
e.g: kafkaTopic - kafkaTopic1, kafkaTopic2
Replica-Sets-
Replica within Broker for a certain partition, with
Leader(writes) and follower (read) balanced using hash-
key modulu.
Kafka Jargon
Log- Message Queue
Log- Message Queue
AWS m1.Large instance
Dual Core Intel Xeon 64-
bit@2.27GHz
7.5 GB RAM
2 x 420 GB hardisk
Hardware Spec of Broker Cluster
Performance Results
Producer thread Transactions Processing time (s)
Throughput
(transaction/sec)
1 488396 64.129 7616
2 1195748 110.868 10785
4 1874713 140.375 13355
10 47269410 338.094 13981
17 7987317 568.028 14061
Latency vs Durability
Ack Status TIme to publish (ms) Tradeoff
No Ack 0.7 Greater data loss
Wait for ack 1.5 Lesser data loss
1. Create a Kafka-Replica Sink
2. Feed data to Copier via Kafka-Sink
3. Benchmark Kakfa-Copier-Vertica Pipeline
4. Improve/Refactor for Performance
Integration Plan
C- Consistency (Producer Sync Mode)
A- Availability (Replication)
P- Partition Tolerance (Cluster in same
network- No Network delay)
Strongly Consistent and
Highly Available
CA-P theorem
Conventional Quoram Replication:
2f+1 replicas → f failures (e.g. ZooKeeper)
Kafka Replication:
f+1 replicas → f failures
Failure Recovery
Leader :
➢  Message is propagated to follower
➢  Commit offset is checkpointed to disk
Follower failure and Recovery:
➢  Kicked out of ISR
➢  After restart, truncates log to last commit
➢  Catches up with leader → ISR
Error Handling: Follower Failure
➢ Embedded Controller via ZK detects
leader failure
➢ Leader election from ISR
➢ Committed message not lost
Error Handling: Leader Failure
- Horizontally scalable
Add partitions in Broker Cluster as higher
throughput needed (~10Mb/s /server)
- Balancing for producer and consumer
Scalability
•  Number of messages the consumer lags behind the producer by
•  Max lag in messages btw follower and leader replica
•  Unclean leader election rate
•  Is controller active on broke
Monitoring via JMX
3 Node cluster ~ 25 MB/s, < 20 ms latency
(end2end)
having replication factor of 3 with 6
consumer group
Monthly operations
$500 + $250 (zookeeper 3 node) + 1 mm
Operation Costs
- No callback in producer’s send()
- Fully automatic balancing till script is
manually run- Balancing layer of topics
via ZK
Nothing is perfect
• Infinite scaling with 10Mb/s /server with
< 10 ms latency
• Costs <$1000 /month + 1 man-month
• No impact on current pipeline
• 0.8 final release due in days to be used
in production
Summary

Contenu connexe

Tendances

Building a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache KafkaBuilding a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache KafkaGuozhang Wang
 
Reliable Message Delivery with Apache Kafka
Reliable Message Delivery with Apache Kafka Reliable Message Delivery with Apache Kafka
Reliable Message Delivery with Apache Kafka confluent
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafkaconfluent
 
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...NoSQLmatters
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems confluent
 
How Zhaopin contributes to Pulsar community
How Zhaopin contributes to Pulsar communityHow Zhaopin contributes to Pulsar community
How Zhaopin contributes to Pulsar communityStreamNative
 
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...confluent
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin PodvalMartin Podval
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controllerconfluent
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon
 
OVN operationalization at scale at eBay
OVN operationalization at scale at eBayOVN operationalization at scale at eBay
OVN operationalization at scale at eBayAliasgar Ginwala
 
Webinar Slides : Migrating to MySQL, MariaDB Galera and/or Percona XtraDB Clu...
Webinar Slides : Migrating to MySQL, MariaDB Galera and/or Percona XtraDB Clu...Webinar Slides : Migrating to MySQL, MariaDB Galera and/or Percona XtraDB Clu...
Webinar Slides : Migrating to MySQL, MariaDB Galera and/or Percona XtraDB Clu...Severalnines
 
Extending functionality in nginx, with modules!
Extending functionality in nginx, with modules!Extending functionality in nginx, with modules!
Extending functionality in nginx, with modules!Trygve Vea
 
The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...
The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...
The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...HostedbyConfluent
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayAltinity Ltd
 
Peter Zaitsev "18 ways to fix MySQL bottlenecks"
Peter Zaitsev "18 ways to fix MySQL bottlenecks"Peter Zaitsev "18 ways to fix MySQL bottlenecks"
Peter Zaitsev "18 ways to fix MySQL bottlenecks"Fwdays
 

Tendances (20)

Building a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache KafkaBuilding a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache Kafka
 
Reliable Message Delivery with Apache Kafka
Reliable Message Delivery with Apache Kafka Reliable Message Delivery with Apache Kafka
Reliable Message Delivery with Apache Kafka
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environment
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
 
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems
 
How Zhaopin contributes to Pulsar community
How Zhaopin contributes to Pulsar communityHow Zhaopin contributes to Pulsar community
How Zhaopin contributes to Pulsar community
 
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
 
OVN operationalization at scale at eBay
OVN operationalization at scale at eBayOVN operationalization at scale at eBay
OVN operationalization at scale at eBay
 
Webinar Slides : Migrating to MySQL, MariaDB Galera and/or Percona XtraDB Clu...
Webinar Slides : Migrating to MySQL, MariaDB Galera and/or Percona XtraDB Clu...Webinar Slides : Migrating to MySQL, MariaDB Galera and/or Percona XtraDB Clu...
Webinar Slides : Migrating to MySQL, MariaDB Galera and/or Percona XtraDB Clu...
 
Apache Kafka Demo
Apache Kafka DemoApache Kafka Demo
Apache Kafka Demo
 
Hbase Nosql
Hbase NosqlHbase Nosql
Hbase Nosql
 
Extending functionality in nginx, with modules!
Extending functionality in nginx, with modules!Extending functionality in nginx, with modules!
Extending functionality in nginx, with modules!
 
The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...
The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...
The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
 
Peter Zaitsev "18 ways to fix MySQL bottlenecks"
Peter Zaitsev "18 ways to fix MySQL bottlenecks"Peter Zaitsev "18 ways to fix MySQL bottlenecks"
Peter Zaitsev "18 ways to fix MySQL bottlenecks"
 

En vedette

Amazon shafaq v6-10-12-11
Amazon shafaq v6-10-12-11Amazon shafaq v6-10-12-11
Amazon shafaq v6-10-12-11Shafaq Abdullah
 
A little about Message Queues - Boston Riak Meetup
A little about Message Queues - Boston Riak MeetupA little about Message Queues - Boston Riak Meetup
A little about Message Queues - Boston Riak MeetupBasho Technologies
 
MongoDB MapReduce Business Intelligence
MongoDB MapReduce Business IntelligenceMongoDB MapReduce Business Intelligence
MongoDB MapReduce Business IntelligenceShafaq Abdullah
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Christopher Curtin
 
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRHadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRDouglas Bernardini
 
Performance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsPerformance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsDataWorks Summit/Hadoop Summit
 
Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center   Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center confluent
 
Distributed stream processing with Apache Kafka
Distributed stream processing with Apache KafkaDistributed stream processing with Apache Kafka
Distributed stream processing with Apache Kafkaconfluent
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka StreamsGuozhang Wang
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaJoe Stein
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
Hadoop data ingestion
Hadoop data ingestionHadoop data ingestion
Hadoop data ingestionVinod Nayal
 

En vedette (15)

Amazon shafaq v6-10-12-11
Amazon shafaq v6-10-12-11Amazon shafaq v6-10-12-11
Amazon shafaq v6-10-12-11
 
Elk meetup
Elk meetupElk meetup
Elk meetup
 
A little about Message Queues - Boston Riak Meetup
A little about Message Queues - Boston Riak MeetupA little about Message Queues - Boston Riak Meetup
A little about Message Queues - Boston Riak Meetup
 
MongoDB MapReduce Business Intelligence
MongoDB MapReduce Business IntelligenceMongoDB MapReduce Business Intelligence
MongoDB MapReduce Business Intelligence
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
 
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRHadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
 
Performance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsPerformance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data Platforms
 
Elk stack
Elk stackElk stack
Elk stack
 
Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center   Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center
 
Distributed stream processing with Apache Kafka
Distributed stream processing with Apache KafkaDistributed stream processing with Apache Kafka
Distributed stream processing with Apache Kafka
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Hadoop data ingestion
Hadoop data ingestionHadoop data ingestion
Hadoop data ingestion
 

Similaire à Kafka Evaluation - High Throughout Message Queue

Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Productionconfluent
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingDibyendu Bhattacharya
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
Apache Kafka
Apache KafkaApache Kafka
Apache KafkaJoe Stein
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
 
Kafka practical experience
Kafka practical experienceKafka practical experience
Kafka practical experienceRico Chen
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEkawamuray
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 
Kafka and ibm event streams basics
Kafka and ibm event streams basicsKafka and ibm event streams basics
Kafka and ibm event streams basicsBrian S. Paskin
 
F_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptxF_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptxNIMITJAIN71
 
Pulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platformPulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platformMatteo Merli
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...Athens Big Data
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafkaSamuel Kerrien
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...Yahoo Developer Network
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...confluent
 

Similaire à Kafka Evaluation - High Throughout Message Queue (20)

Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Kafka practical experience
Kafka practical experienceKafka practical experience
Kafka practical experience
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINE
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Kafka and ibm event streams basics
Kafka and ibm event streams basicsKafka and ibm event streams basics
Kafka and ibm event streams basics
 
F_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptxF_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptx
 
Pulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platformPulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platform
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
 

Dernier

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Dernier (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Kafka Evaluation - High Throughout Message Queue

  • 1. Kafka Evaluation for Data Pipeline and ETL Shafaq Abdullah @ GREE Date: 10/21/2013
  • 3. Kafka - A distributed messaging pub/sub system In Production: Linkedin, FourSquare, Twitter, GREE (almost) Usage: - Log aggregation (user activity stream) - Real-time events - Monitoring - Queuing Intro
  • 4. Point-to-Point Data Pipelines Game Server 1 Logs Hadoop ... User Tracking SecuritySearch Social Graph Rules/ Recomm endation/ Engine VerticaOps DataWare House
  • 5. Message- Central Data Channel Game Server 1 Logs Hadoop ... User Tracking SecuritySearch Social Graph Rules/ Recomm endation/ Engine VerticaOps Data Warehouse Message Queue
  • 6. Topic- A String representing Message Stream Id Partition- Logical Division per topic-level within Broker, for writing logs generated by producer. e.g: kafkaTopic - kafkaTopic1, kafkaTopic2 Replica-Sets- Replica within Broker for a certain partition, with Leader(writes) and follower (read) balanced using hash- key modulu. Kafka Jargon
  • 9. AWS m1.Large instance Dual Core Intel Xeon 64- bit@2.27GHz 7.5 GB RAM 2 x 420 GB hardisk Hardware Spec of Broker Cluster
  • 10. Performance Results Producer thread Transactions Processing time (s) Throughput (transaction/sec) 1 488396 64.129 7616 2 1195748 110.868 10785 4 1874713 140.375 13355 10 47269410 338.094 13981 17 7987317 568.028 14061
  • 11. Latency vs Durability Ack Status TIme to publish (ms) Tradeoff No Ack 0.7 Greater data loss Wait for ack 1.5 Lesser data loss
  • 12. 1. Create a Kafka-Replica Sink 2. Feed data to Copier via Kafka-Sink 3. Benchmark Kakfa-Copier-Vertica Pipeline 4. Improve/Refactor for Performance Integration Plan
  • 13. C- Consistency (Producer Sync Mode) A- Availability (Replication) P- Partition Tolerance (Cluster in same network- No Network delay) Strongly Consistent and Highly Available CA-P theorem
  • 14. Conventional Quoram Replication: 2f+1 replicas → f failures (e.g. ZooKeeper) Kafka Replication: f+1 replicas → f failures Failure Recovery
  • 15.
  • 16. Leader : ➢  Message is propagated to follower ➢  Commit offset is checkpointed to disk Follower failure and Recovery: ➢  Kicked out of ISR ➢  After restart, truncates log to last commit ➢  Catches up with leader → ISR Error Handling: Follower Failure
  • 17. ➢ Embedded Controller via ZK detects leader failure ➢ Leader election from ISR ➢ Committed message not lost Error Handling: Leader Failure
  • 18. - Horizontally scalable Add partitions in Broker Cluster as higher throughput needed (~10Mb/s /server) - Balancing for producer and consumer Scalability
  • 19. •  Number of messages the consumer lags behind the producer by •  Max lag in messages btw follower and leader replica •  Unclean leader election rate •  Is controller active on broke Monitoring via JMX
  • 20. 3 Node cluster ~ 25 MB/s, < 20 ms latency (end2end) having replication factor of 3 with 6 consumer group Monthly operations $500 + $250 (zookeeper 3 node) + 1 mm Operation Costs
  • 21. - No callback in producer’s send() - Fully automatic balancing till script is manually run- Balancing layer of topics via ZK Nothing is perfect
  • 22. • Infinite scaling with 10Mb/s /server with < 10 ms latency • Costs <$1000 /month + 1 man-month • No impact on current pipeline • 0.8 final release due in days to be used in production Summary