SlideShare une entreprise Scribd logo
1  sur  19
Kafka Vs Kinesis
Agenda
1. Kafka architecture high level overview
2. Comparison with Kinesis in terms of throughput and cost
3. Headaches with Kinesis and Kafka
4. Use case for the data team
5. Reasons for switching
6. Success stories
7. References
Kafka ArchitectureVery similar to Kinesis!
That shouldn’t come as a surprise
as Kinesis was inspired by Kafka.
Kinesis Kafka
Stream Topic
Shard Partition
DynamoDB tables Zookeeper
Architecture (Contd..)
▶ Kafka broker stores all messages in the partitions configured for that particular topic. It
ensures the messages are equally shared between partitions.
▶ Once the consumer subscribes to a topic, Kafka will provide the current offset of the topic to
the consumer and also saves the offset in the Zookeeper ensemble.
▶ Consumer will request the Kafka in a regular interval (configurable) for new messages.
▶ Once the messages are processed, consumer will send an acknowledgement to the Kafka
broker.
▶ Once Kafka receives an acknowledgement, it changes the offset to the new value and
Working
How do you scale?
▶ Consumer side scaling -
▶ Each application instance is a part of a
consumer group and reads from at least
one partition of the topic it is subscribed
to. (Consumer group A)
▶ Once additional application instances are
added to the consumer group, Kafka
reassigns partitions so that the additional
instance can read from at least one
partition. (Consumer group B)
▶ Producer side scaling -
▶ In case of producer spikes, producer can
write to multiple partitions across multiple
brokers. The throughput is controlled by
the network card I/O capacity and the
disk space attached to the broker.
▶ Kinesis
▶ Write - 1,000 records per second for writes, up to a maximum total
data write rate of 1 MB per second (including partition keys)
▶ Read - up to 5 transactions per second for reads, up to a maximum
total data read rate of 2 MB per second
▶ Retention - 1 day by default
▶ Kafka
▶ Write - Dependent on the network card
▶ Read - Dependent on the network card
▶ Retention - 7 days
Throughput
▶ Test setup -
▶ Cluster - Three Intel Xeon 2.5 GHz processor with six cores
▶ Six 7200 RPM SATA drives, 32GB of RAM, 1Gbps Ethernet
▶ Equivalent EC2 instance - t2.2xlarge priced at 0.376 $ per hour.
▶ Test - Single producer thread, 3x asynchronous replication
▶ Record size - 100 byte.
▶ Throughput recorded with a cluster of 3 machines - 786,980 records/sec
(75.1 MB/sec) being consumed and persisted in the Kafka cluster.
▶ Total cost of the cluster per hour - 0.376 * 3 = 1.128 $ (excluding the zookeeper
Throughput and cost comparison
Kafka
▶ Kinesis shard capacity - 1MB/sec.
▶ Total number of shards required for a comparable test - 75.
▶ Cost per shard - $0.015 / hour.
▶ Cost of PUT Payload Units, per 1,000,000 units - $0.014
▶ Total no of Payload Units per hour - (75 MB/sec * 3600 sec ) / 25 KB - (1)
▶ Total no of PUTS per hour - (1) / 1M - Around 11
▶ Total cost - 75 * 0.015 + 11 * 0.014 = 1.29$
So, Total cost is around the same - 1.12$/hour for Kafka (without Zookeeper)
vs 1.29$/hour for Kinesis.
Throughput and cost comparison
Kinesis
More detailed comparison
More detailed comparison
Limits on kinesis suck -
1. Kinesis has a limit of 5 reads per second from a shard. So, if we built 5 components that would
need to read the same data and process from a shard, we would have already maxed out with
Kinesis. This seemed like an unnecessary limitation on scaling out consumers. Of course, there
are workarounds by increasing the number of shards, but then, you end up paying more too.
Front end of kinesis has a load balancer, backend does not. Thus, the strong limit.
1. Describe Stream API limits - 10 calls per account per second. A lot of calls are being made by the
KCL, which means shard monitoring and scaling up and down is subject to failure.
1. Other bugs like “vanishing history” after shard splitting, more worker leases than total number of
workers available.
Headaches with Kinesis
▶ Main concern → Everything needs to be managed.
▶ These concerns should be alleviated after the Kafka as a service
launch.
Headaches with Kafka
Use case for the data team
Kafka
▶ Capable of handling massive amount of messages.
▶ Easier to scale out. Can scale vertically as well.
▶ A new aws instance and start the Kafka broker can be started on it within a
matter of 1-2 minutes in us-west-1by using EBS to minimize the data transfer (as
per Confluent).
▶ Lower end to end latency than Kinesis, as Kinesis writes its data synchronously to
3 locations before it confirms a put request. Kafka supports async replication.
▶ More mature than Kinesis, less bugs.
▶ More flexible than Kinesis, no limits.
▶ Huge open source support.
▶ Plenty of success stories where Kafka is used as the log and materialized views
are constructed on top of it, using Spark, Samza, Storm, Flink etc.
Why switch from Kinesis to Kafka
Companies using Kafka
How Netflix uses Kafka on AWS
Questions/Comments/Suggestions?
▶ Architecture - https://kafka.apache.org/documentation/
▶ Throughout study - https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-
million-writes-second-three-cheap-machines
▶ Kinesis Limits - http://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-
limits.html
▶ Detailed comparison - http://go.datapipe.com/whitepaper-kafka-vs-kinesis-download
▶ How netflix uses Kafka - https://medium.com/netflix-techblog/kafka-inside-keystone-
pipeline-dd5aeabaf6bb
References

Contenu connexe

Tendances

Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for DatabricksDatabricks
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systemsDave Gardner
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practicesconfluent
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaJoe Stein
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaDatabricks
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes IntroductionEric Gustafson
 
Open stack architecture overview-meetup-6-6_2013
Open stack architecture overview-meetup-6-6_2013Open stack architecture overview-meetup-6-6_2013
Open stack architecture overview-meetup-6-6_2013Mirantis
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Terraform Best Practices - DevOps Unicorns 2019
Terraform Best Practices - DevOps Unicorns 2019Terraform Best Practices - DevOps Unicorns 2019
Terraform Best Practices - DevOps Unicorns 2019Anton Babenko
 
Introduction of CCE and DevCloud
Introduction of CCE and DevCloudIntroduction of CCE and DevCloud
Introduction of CCE and DevCloudOpsta
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglyTyler Wishnoff
 
Optimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsOptimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsJavier González
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMike Dirolf
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...HostedbyConfluent
 

Tendances (20)

Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes Introduction
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for Databricks
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systems
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes Introduction
 
Open stack architecture overview-meetup-6-6_2013
Open stack architecture overview-meetup-6-6_2013Open stack architecture overview-meetup-6-6_2013
Open stack architecture overview-meetup-6-6_2013
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Terraform Best Practices - DevOps Unicorns 2019
Terraform Best Practices - DevOps Unicorns 2019Terraform Best Practices - DevOps Unicorns 2019
Terraform Best Practices - DevOps Unicorns 2019
 
Introduction of CCE and DevCloud
Introduction of CCE and DevCloudIntroduction of CCE and DevCloud
Introduction of CCE and DevCloud
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
 
Docker Kubernetes Istio
Docker Kubernetes IstioDocker Kubernetes Istio
Docker Kubernetes Istio
 
Optimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsOptimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDs
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 

Similaire à Kafka vs kinesis

OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardPaul Brebner
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardPaul Brebner
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with storesYoni Farin
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streamsYoni Farin
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Gwen (Chen) Shapira
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini
 
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...Paul Brebner
 
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...ScyllaDB
 
Open HFT libraries in @Java
Open HFT libraries in @JavaOpen HFT libraries in @Java
Open HFT libraries in @JavaPeter Lawrey
 
Acsug scalable windows azure patterns
Acsug scalable windows azure patternsAcsug scalable windows azure patterns
Acsug scalable windows azure patternsNikolai Blackie
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016Monal Daxini
 
Building big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesBuilding big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesVenu Ryali
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data AnalyticsAnkur Bansal
 
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Introduction to streaming and messaging  flume,kafka,SQS,kinesis Introduction to streaming and messaging  flume,kafka,SQS,kinesis
Introduction to streaming and messaging flume,kafka,SQS,kinesis Omid Vahdaty
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...HostedbyConfluent
 

Similaire à Kafka vs kinesis (20)

OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with stores
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Kafka Deep Dive
Kafka Deep DiveKafka Deep Dive
Kafka Deep Dive
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
 
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
 
Open HFT libraries in @Java
Open HFT libraries in @JavaOpen HFT libraries in @Java
Open HFT libraries in @Java
 
Acsug scalable windows azure patterns
Acsug scalable windows azure patternsAcsug scalable windows azure patterns
Acsug scalable windows azure patterns
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 
Building big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesBuilding big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and Kubernetes
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data Analytics
 
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Introduction to streaming and messaging  flume,kafka,SQS,kinesis Introduction to streaming and messaging  flume,kafka,SQS,kinesis
Introduction to streaming and messaging flume,kafka,SQS,kinesis
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
 

Dernier

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 

Dernier (20)

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 

Kafka vs kinesis

  • 2. Agenda 1. Kafka architecture high level overview 2. Comparison with Kinesis in terms of throughput and cost 3. Headaches with Kinesis and Kafka 4. Use case for the data team 5. Reasons for switching 6. Success stories 7. References
  • 3. Kafka ArchitectureVery similar to Kinesis! That shouldn’t come as a surprise as Kinesis was inspired by Kafka.
  • 4. Kinesis Kafka Stream Topic Shard Partition DynamoDB tables Zookeeper Architecture (Contd..)
  • 5. ▶ Kafka broker stores all messages in the partitions configured for that particular topic. It ensures the messages are equally shared between partitions. ▶ Once the consumer subscribes to a topic, Kafka will provide the current offset of the topic to the consumer and also saves the offset in the Zookeeper ensemble. ▶ Consumer will request the Kafka in a regular interval (configurable) for new messages. ▶ Once the messages are processed, consumer will send an acknowledgement to the Kafka broker. ▶ Once Kafka receives an acknowledgement, it changes the offset to the new value and Working
  • 6. How do you scale? ▶ Consumer side scaling - ▶ Each application instance is a part of a consumer group and reads from at least one partition of the topic it is subscribed to. (Consumer group A) ▶ Once additional application instances are added to the consumer group, Kafka reassigns partitions so that the additional instance can read from at least one partition. (Consumer group B) ▶ Producer side scaling - ▶ In case of producer spikes, producer can write to multiple partitions across multiple brokers. The throughput is controlled by the network card I/O capacity and the disk space attached to the broker.
  • 7. ▶ Kinesis ▶ Write - 1,000 records per second for writes, up to a maximum total data write rate of 1 MB per second (including partition keys) ▶ Read - up to 5 transactions per second for reads, up to a maximum total data read rate of 2 MB per second ▶ Retention - 1 day by default ▶ Kafka ▶ Write - Dependent on the network card ▶ Read - Dependent on the network card ▶ Retention - 7 days Throughput
  • 8. ▶ Test setup - ▶ Cluster - Three Intel Xeon 2.5 GHz processor with six cores ▶ Six 7200 RPM SATA drives, 32GB of RAM, 1Gbps Ethernet ▶ Equivalent EC2 instance - t2.2xlarge priced at 0.376 $ per hour. ▶ Test - Single producer thread, 3x asynchronous replication ▶ Record size - 100 byte. ▶ Throughput recorded with a cluster of 3 machines - 786,980 records/sec (75.1 MB/sec) being consumed and persisted in the Kafka cluster. ▶ Total cost of the cluster per hour - 0.376 * 3 = 1.128 $ (excluding the zookeeper Throughput and cost comparison Kafka
  • 9. ▶ Kinesis shard capacity - 1MB/sec. ▶ Total number of shards required for a comparable test - 75. ▶ Cost per shard - $0.015 / hour. ▶ Cost of PUT Payload Units, per 1,000,000 units - $0.014 ▶ Total no of Payload Units per hour - (75 MB/sec * 3600 sec ) / 25 KB - (1) ▶ Total no of PUTS per hour - (1) / 1M - Around 11 ▶ Total cost - 75 * 0.015 + 11 * 0.014 = 1.29$ So, Total cost is around the same - 1.12$/hour for Kafka (without Zookeeper) vs 1.29$/hour for Kinesis. Throughput and cost comparison Kinesis
  • 12. Limits on kinesis suck - 1. Kinesis has a limit of 5 reads per second from a shard. So, if we built 5 components that would need to read the same data and process from a shard, we would have already maxed out with Kinesis. This seemed like an unnecessary limitation on scaling out consumers. Of course, there are workarounds by increasing the number of shards, but then, you end up paying more too. Front end of kinesis has a load balancer, backend does not. Thus, the strong limit. 1. Describe Stream API limits - 10 calls per account per second. A lot of calls are being made by the KCL, which means shard monitoring and scaling up and down is subject to failure. 1. Other bugs like “vanishing history” after shard splitting, more worker leases than total number of workers available. Headaches with Kinesis
  • 13. ▶ Main concern → Everything needs to be managed. ▶ These concerns should be alleviated after the Kafka as a service launch. Headaches with Kafka
  • 14. Use case for the data team Kafka
  • 15. ▶ Capable of handling massive amount of messages. ▶ Easier to scale out. Can scale vertically as well. ▶ A new aws instance and start the Kafka broker can be started on it within a matter of 1-2 minutes in us-west-1by using EBS to minimize the data transfer (as per Confluent). ▶ Lower end to end latency than Kinesis, as Kinesis writes its data synchronously to 3 locations before it confirms a put request. Kafka supports async replication. ▶ More mature than Kinesis, less bugs. ▶ More flexible than Kinesis, no limits. ▶ Huge open source support. ▶ Plenty of success stories where Kafka is used as the log and materialized views are constructed on top of it, using Spark, Samza, Storm, Flink etc. Why switch from Kinesis to Kafka
  • 17. How Netflix uses Kafka on AWS
  • 19. ▶ Architecture - https://kafka.apache.org/documentation/ ▶ Throughout study - https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2- million-writes-second-three-cheap-machines ▶ Kinesis Limits - http://docs.aws.amazon.com/streams/latest/dev/service-sizes-and- limits.html ▶ Detailed comparison - http://go.datapipe.com/whitepaper-kafka-vs-kinesis-download ▶ How netflix uses Kafka - https://medium.com/netflix-techblog/kafka-inside-keystone- pipeline-dd5aeabaf6bb References

Notes de l'éditeur

  1. Source - https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
  2. Source - http://go.datapipe.com/whitepaper-kafka-vs-kinesis-download
  3. Source - http://go.datapipe.com/whitepaper-kafka-vs-kinesis-download
  4. Source - http://go.datapipe.com/whitepaper-kafka-vs-kinesis-download