SlideShare une entreprise Scribd logo
1  sur  22
No Data Loss Pipeline
with Apache Kafka
Jiangjie (Becket) Qin @ LinkedIn
● Data loss
o producer.send(record) is called but record
did not end up in consumer as expected
● Message reordering
o send(record1) is called before send(record2)
o record2 shows in broker before record1 does
o matters in cases like DB replication
Data loss and message reordering
Kafka based data pipeline
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Today’s Agenda:
● No data loss
● No message reordering
● Mirror maker enhancement
○ Customized consumer rebalance listener
○ Message handler
Synchronous send is safe but slow...
producer.send(record).get()
Producer
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Using asynchronous send with callback can
be tricky
producer.send(record,callback)
Producer
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Producer can cause data loss when
● block.on.buffer.full=false
● retries are exhausted
● sending message without using acks=all
Producer
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Is this good enough?
producer.send(record,callback)
● block.on.buffer.full=TRUE
● retries=Long.MAX_VALUE
● acks=all
● resend in callback when message send failed
Producer
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Message reordering might happen if:
● max.in.flight.requests.per.connection > 1, AND
● retries are enabled
Producer
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Kafka BrokerProducer
message 0
message 1
message 0 failed
retry message 0
Timeline
Message reordering might also happen if:
● producer is closed carelessly
o close producer in user thread, or
o close without using close(0)
Producer
Record
Accumulator
Sender Thread
Kafka Broker
Timeline
1.msg 0
2.callback(msg 0) ack expt.
User
Thread
close prod.
3.msg 1
notify
● close producer in the callback on error
● close producer with close(0) to prevent further
sending after previous message send failed
Producer
Record
Accumulator
Sender Thread
Kafka Broker
Timeline
1.msg 0
2.callback(msg 0) ack expt.
User
Thread
close(0)
notify
To prevent data loss:
● block.on.buffer.full=TRUE
● retries=Long.MAX_VALUE (for some use cases)
● acks=all
To prevent reordering:
● max.in.flight.requests.per.connection=1
● close producer in callback with close(0) on send failure
Producer
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Not a perfect solution:
● Producer needs to be closed to guarantee message
order. E.g. In mirror maker, one message send failure
to a topic should not affect the whole pipeline.
● When producer is down, message in buffer will still be
lost
Producer
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Correct producer setting is not enough
● acks=all still can lose data when unclean
leader election happens.
● Two replicas are needed at any time to
guarantee data persistence.
Kafka Brokers
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
● replication factor >= 3
● min.isr = 2
● Replication factor > min.isr
o If replication factor = min.isr, partition will
be offline when one replica is down
Kafka Brokers
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Settings we use:
● replication factor = 3
● min.isr = 2
● unclean leader election disabled
Kafka Brokers
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
● Consumer might lose message when offsets are
committed carelessly. E.g. commit offsets before
processing messages completely
o Disable auto.offset.commit
o Commit offsets only after the messages are
processed
Consumer
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Kafka based data pipeline
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Today’s Agenda:
● No data loss
● No message reordering
● Mirror maker enhancement
○ Customized consumer rebalance listener
○ Message handler
● Consume-then-produce pattern
● Only commit consumer offsets of
messages acked by target cluster
● Default to no-data-loss and no-
reordering settings
Mirror Maker Enhancement
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
● Customized Consumer Rebalance
Listener
o Can be used to propagate topic change from
source cluster to target cluster. E.g.
partition number change, new topic
creation.
Mirror Maker Enhancement
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
● Customized Message Handler, useful for
o partition-to-partition mirror
o filtering out messages
o message format conversion
o other simple message processing
Mirror Maker Enhancement
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
● startup/shutdown acceleration
o parallelized startup and shutdown
o 26 nodes cluster with 4 consumer each
takes about 1 min to startup and shutdown
Mirror Maker Enhancement
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Q&A

Contenu connexe

Tendances

Tendances (20)

A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Improving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at UberImproving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at Uber
 
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak Performance
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
 
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasVirtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 

En vedette

BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
Brendan Gregg
 

En vedette (8)

Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Performance Tuning EC2 Instances
Performance Tuning EC2 InstancesPerformance Tuning EC2 Instances
Performance Tuning EC2 Instances
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
 

Similaire à No data loss pipeline with apache kafka

Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
HostedbyConfluent
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
Jun Rao
 
Kafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message QueueKafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message Queue
Shafaq Abdullah
 

Similaire à No data loss pipeline with apache kafka (20)

Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Apache Kafka Reliability
Apache Kafka Reliability Apache Kafka Reliability
Apache Kafka Reliability
 
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ... A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015 Apache Kafka Reliability Guarantees StrataHadoop NYC 2015
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Apache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Pattern
 
3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have To3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have To
 
Kafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message QueueKafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message Queue
 
Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafka
 
Reactive mistakes - ScalaDays Chicago 2017
Reactive mistakes -  ScalaDays Chicago 2017Reactive mistakes -  ScalaDays Chicago 2017
Reactive mistakes - ScalaDays Chicago 2017
 
Kafka reliability velocity 17
Kafka reliability   velocity 17Kafka reliability   velocity 17
Kafka reliability velocity 17
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
 
Webinar Back to Basics 3 - Introduzione ai Replica Set
Webinar Back to Basics 3 - Introduzione ai Replica SetWebinar Back to Basics 3 - Introduzione ai Replica Set
Webinar Back to Basics 3 - Introduzione ai Replica Set
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 

No data loss pipeline with apache kafka

  • 1. No Data Loss Pipeline with Apache Kafka Jiangjie (Becket) Qin @ LinkedIn
  • 2. ● Data loss o producer.send(record) is called but record did not end up in consumer as expected ● Message reordering o send(record1) is called before send(record2) o record2 shows in broker before record1 does o matters in cases like DB replication Data loss and message reordering
  • 3. Kafka based data pipeline Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker Today’s Agenda: ● No data loss ● No message reordering ● Mirror maker enhancement ○ Customized consumer rebalance listener ○ Message handler
  • 4. Synchronous send is safe but slow... producer.send(record).get() Producer Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 5. Using asynchronous send with callback can be tricky producer.send(record,callback) Producer Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 6. Producer can cause data loss when ● block.on.buffer.full=false ● retries are exhausted ● sending message without using acks=all Producer Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 7. Is this good enough? producer.send(record,callback) ● block.on.buffer.full=TRUE ● retries=Long.MAX_VALUE ● acks=all ● resend in callback when message send failed Producer Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 8. Message reordering might happen if: ● max.in.flight.requests.per.connection > 1, AND ● retries are enabled Producer Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker Kafka BrokerProducer message 0 message 1 message 0 failed retry message 0 Timeline
  • 9. Message reordering might also happen if: ● producer is closed carelessly o close producer in user thread, or o close without using close(0) Producer Record Accumulator Sender Thread Kafka Broker Timeline 1.msg 0 2.callback(msg 0) ack expt. User Thread close prod. 3.msg 1 notify
  • 10. ● close producer in the callback on error ● close producer with close(0) to prevent further sending after previous message send failed Producer Record Accumulator Sender Thread Kafka Broker Timeline 1.msg 0 2.callback(msg 0) ack expt. User Thread close(0) notify
  • 11. To prevent data loss: ● block.on.buffer.full=TRUE ● retries=Long.MAX_VALUE (for some use cases) ● acks=all To prevent reordering: ● max.in.flight.requests.per.connection=1 ● close producer in callback with close(0) on send failure Producer Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 12. Not a perfect solution: ● Producer needs to be closed to guarantee message order. E.g. In mirror maker, one message send failure to a topic should not affect the whole pipeline. ● When producer is down, message in buffer will still be lost Producer Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 13. Correct producer setting is not enough ● acks=all still can lose data when unclean leader election happens. ● Two replicas are needed at any time to guarantee data persistence. Kafka Brokers Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 14. ● replication factor >= 3 ● min.isr = 2 ● Replication factor > min.isr o If replication factor = min.isr, partition will be offline when one replica is down Kafka Brokers Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 15. Settings we use: ● replication factor = 3 ● min.isr = 2 ● unclean leader election disabled Kafka Brokers Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 16. ● Consumer might lose message when offsets are committed carelessly. E.g. commit offsets before processing messages completely o Disable auto.offset.commit o Commit offsets only after the messages are processed Consumer Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 17. Kafka based data pipeline Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker Today’s Agenda: ● No data loss ● No message reordering ● Mirror maker enhancement ○ Customized consumer rebalance listener ○ Message handler
  • 18. ● Consume-then-produce pattern ● Only commit consumer offsets of messages acked by target cluster ● Default to no-data-loss and no- reordering settings Mirror Maker Enhancement Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 19. ● Customized Consumer Rebalance Listener o Can be used to propagate topic change from source cluster to target cluster. E.g. partition number change, new topic creation. Mirror Maker Enhancement Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 20. ● Customized Message Handler, useful for o partition-to-partition mirror o filtering out messages o message format conversion o other simple message processing Mirror Maker Enhancement Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 21. ● startup/shutdown acceleration o parallelized startup and shutdown o 26 nodes cluster with 4 consumer each takes about 1 min to startup and shutdown Mirror Maker Enhancement Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 22. Q&A

Notes de l'éditeur

  1. What if we just stop producing on send failure? No retry.