A Deep Dive into Kafka Controller

•

3 j'aime•337 vues

Speaker: Jun Rao, VP of Apache Kafka and Co-founder of Confluent The controller is the brain of Apache Kafka®. A big part of what the controller does is to maintain the consistency of the replicas and determine which replica can be used to serve the clients, especially during individual broker failure. In this talk, Jun will outline the main data flow in the controller—in particular, when a broker fails, how the controller automatically promotes another replica as the leader to serve the clients, and when a broker is started, how the controller resumes the replication pipeline in the restarted broker. Jun will then describe recent improvements to the controller that allow it to handle certain edge cases correctly and increase its performance, which allows for more partitions in a Kafka cluster. Jun Rao is the co-founder of Confluent, a company that provides a streaming data platform on top of Apache Kafka. Previously, Jun was a senior staff engineer at LinkedIn, where he led the development of Kafka, and a researcher at IBM's Almaden research datacenter, where he conducted research on database and distributed systems. Jun is the PMC chair of Apache Kafka and a committer of Cassandra. He writes at https://cnfl.io/blog-jun-rao.

Technologie

A Deep Dive into Kafka
Controller
Jun Rao
VP of Apache Kafka
Co-founder of Confluent

Kafka Replication
• Configurable replication factor
• Tolerating f – 1 failures with f replicas
• Automated failover
topic1-part1
logs
broker 1
topic1-part2
logs
broker 2
topic2-part2
topic2-part1
logs
broker 3
topic1-part1
logs
broker 4
topic1-part2
topic2-part2 topic1-part1 topic1-part2
topic2-part1
topic2-part2
topic2-part1

High Level Data Flow in Replication
broker 1
producer
leader
broker 2
follower
broker 3
follower
4
2
2
3
commit
ack
topic1-part1 topic1-part1 topic1-part1
consumer
1

What’s controller
4
• One broker in a cluster acts as controller
• Monitor the liveness of brokers
• Elect new leaders on broker failure
• Communicate new leaders to brokers

Controller election
Zookeeper
/controller  broker 0
Controller
broker 0 broker 3broker 2broker 1

Partition state: stored in ZK, cached in
controller
Zookeeper
/topic/t1/0  leader:1
/topic/t1/1  leader:3
…
/topic/t1/9  leader:2
Controller
broker 0 broker 3broker 2broker 1

Controlled shutdown
SIG_TERM
Zookeeper
Controller
1
2
broker 2
part t-0: follower
part t-1: follower
broker 1
part t-0: leader
part t-1: leader
broker 0
Zookeeper
Controller
3
5
broker 2
part t-0: leader
part t-1: leader
broker 1
part t-0:
part t-1:
broker 0
4
/topics/t/0  2
/topics/t/1  2

Issues with controlled shutdown (pre 1.1)
Zookeeper
Controller
3
5
broker 0
4
Writes to ZK
are serial
Impact:
longer
shutdown
time
Communication of new
leaders not batched
Impact: client timeout
broker 2
part t-0: leader
part t-1: leader
broker 1
part t-0:
part t-1:
/topics/t/0  2
/topics/t/1  2

Controller failover
Zookeeper
/controller  broker 0
Controller
broker 0 broker 3broker 2broker 1
1

Controller failover
Controller
broker 0 broker 3broker 2broker 1
1 2
Controller
Zookeeper
/controller  broker 2

Controller failover
Zookeeper
/controller  broker 2
/topic/t1/0  leader:1
/topic/t1/1  leader:3
…
/topic/t1/9  leader:2
Controller
broker 0 broker 3broker 2broker 1
1 2
3
Controller

Issues with controller failover (pre 1.1)
Controller
broker 0 broker 3broker 2broker 1
1 2
3
Controller
Reads from ZK are serial
Impact: availability
Zombie old controller
Impact: inconsistency
Zookeeper
/controller  broker 2
/topic/t1/0  leader:1
/topic/t1/1  leader:3
…
/topic/t1/9  leader:2

Performance improvements in 1.1
13
• Controller uses async ZK api for reads/writes
• Controller communicates new leaders to brokers in batches
part 1 part 2 part 3 part 4
part 1
part 2
part 3
part 4
Old (serial):
New (pipelined):

/topics/t/0  2
/topics/t/1  2
Controlled shutdown (post 1.1)
Zookeeper
Controller
3
5
broker 0
4
Writes to ZK
pipelined
Communication of new
leaders batched
broker 2
part t-0: leader
part t-1: leader
broker 1
part t-0:
part t-1:

Controller failover (post 1.1)
Controller
broker 0 broker 3broker 2broker 1
1 2
3
Controller
Reads from ZK pipelined
Zookeeper
/controller  broker 2
/topic/t1/0  leader:1
/topic/t1/1  leader:3
…
/topic/t1/9  leader:2

Results for controlled shutdown
16
• 5 ZK nodes and 5 brokers on different racks
• 25K topics, 1 partition, 2 replicas
• 10K partitions per broker
Kafka 1.0.0 Kafka 1.1.0
Controlled shutdown time 6.5 minutes 3 seconds

Results for controller failover
17
• 5 ZK nodes and 5 brokers on different racks
• 2K topics, 50 partitions, 1 replica
• Controller failover: reload100K partitions from ZK
Kafka 1.0.0 Kafka 1.1.0
State reload time 28 seconds 14 seconds

Fencing zombie requests from controller
18
• Zombie controller
• ZK session expiration: better handling in the controller (1.1)
• Controller path deletion: writes to ZK conditioned on controller
epoch (2.1)
• Zombie request from broker restart
• Broker epoch (KIP-380 in 2.2)
• Also fixed the missing ZK watcher issue

Summary
• Significant performance improvement in controller in 1.1
• Allow 10X more partitions in a Kafka cluster
• Better fencing of zombie requests from controller (1.1, 2.1,
2.2)
• More details and remaining work in KAFKA-5027

Q/A
• Acknowledgment: Onur Karaman, Manikumar Reddy,
Prasanna Gautam, Ismael Juma, Mickael Maison, Sandor
Murakozi, Rajini Sivaram,Ted Yu, Zhanxiang Huang

Contenu connexe

Tendances

Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...HostedbyConfluent

Kafka basicsJoão Paulo Leonidas Fernandes Dias da Silva

Apache Kafka 0.8 basic training - VerisignMichael Noll

Performance Tuning RocksDB for Kafka Streams’ State Storesconfluent

Apache Kafka Fundamentals for Architects, Admins and Developersconfluent

Introduction to Kafka StreamsGuozhang Wang

Fundamentals of Apache KafkaChhavi Parasher

What is Apache Kafka and What is an Event Streaming Platform?confluent

Kafka Streams: What it is, and how to use it?confluent

Kafka 101Clement Demonchy

Producer Performance Tuning for Apache KafkaJiangjie Qin

Apache Kafka Architecture & Fundamentals Explainedconfluent

Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent

Apache Kafka Best PracticesDataWorks Summit/Hadoop Summit

Stream processing using KafkaKnoldus Inc.

Introduction to Apache KafkaJeff Holoman

Apache Kafka IntroductionAmita Mirajkar

A Modern C++ Kafka API | Kenneth Jia, Morgan StanleyHostedbyConfluent

Apache flinkpranay kumar

Apache kafkaNexThoughts Technologies

Tendances (20)

Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...

Kafka basics

Apache Kafka 0.8 basic training - Verisign

Performance Tuning RocksDB for Kafka Streams’ State Stores

Apache Kafka Fundamentals for Architects, Admins and Developers

Introduction to Kafka Streams

Fundamentals of Apache Kafka

What is Apache Kafka and What is an Event Streaming Platform?

Kafka Streams: What it is, and how to use it?

Kafka 101

Producer Performance Tuning for Apache Kafka

Apache Kafka Architecture & Fundamentals Explained

Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...

Apache Kafka Best Practices

Stream processing using Kafka

Introduction to Apache Kafka

Apache Kafka Introduction

A Modern C++ Kafka API | Kenneth Jia, Morgan Stanley

Apache flink

Apache kafka

Similaire à A Deep Dive into Kafka Controller

Kafka Summit NYC 2017 - Deep Dive Into Apache Kafkaconfluent

Kafka Summit SF 2017 - Running Kafka as a Service at Scaleconfluent

Kafka replication apachecon_2013Jun Rao

The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...HostedbyConfluent

Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...confluent

Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward

Microservices interaction at scale using Apache KafkaIvan Ursul

Citi TechTalk Session 2: Kafka Deep Diveconfluent

Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022HostedbyConfluent

Velocity 2019 - Kafka Operations Deep DiveGwen (Chen) Shapira

The Foundations of Multi-DC Kafka (Jakub Korab, Solutions Architect, Confluen...confluent

Static Membership: Rebalance Strategy Designed for the Cloud (Boyang Chen,Con...confluent

Kafka Needs No KeeperC4Media

Introduction to Apache KafkaShiao-An Yuan

Grokking TechTalk #24: Kafka's principles and protocolsGrokking VN

SFBigAnalytics_20190724: Monitor kafka like a ProChester Chen

Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningGuido Schmutz

Please Upgrade Apache Kafka. Now. (Gwen Shapira, Confluent) Kafka Summit SF 2019confluent

Kafka Technical OverviewSylvester John

Kafka High Availability in multi data center setup with floating Observers wi...HostedbyConfluent

Similaire à A Deep Dive into Kafka Controller (20)

Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka

Kafka Summit SF 2017 - Running Kafka as a Service at Scale

Kafka replication apachecon_2013

The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...

Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...

Exactly-Once Financial Data Processing at Scale with Flink and Pinot

Microservices interaction at scale using Apache Kafka

Citi TechTalk Session 2: Kafka Deep Dive

Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022

Velocity 2019 - Kafka Operations Deep Dive

The Foundations of Multi-DC Kafka (Jakub Korab, Solutions Architect, Confluen...

Static Membership: Rebalance Strategy Designed for the Cloud (Boyang Chen,Con...

Kafka Needs No Keeper

Introduction to Apache Kafka

Grokking TechTalk #24: Kafka's principles and protocols

SFBigAnalytics_20190724: Monitor kafka like a Pro

Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning

Please Upgrade Apache Kafka. Now. (Gwen Shapira, Confluent) Kafka Summit SF 2019

Kafka Technical Overview

Kafka High Availability in multi data center setup with floating Observers wi...

Plus de confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent

Santander Stream Processing with Apache Flinkconfluent

Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent

Workshop híbrido: Stream Processing con Flinkconfluent

Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent

AWS Immersion Day Mapfre - Confluentconfluent

Eventos y Microservicios - Santander TechTalkconfluent

Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent

Build real-time streaming data pipelines to AWS with Confluentconfluent

Q&A with Confluent Professional Services: Confluent Service Meshconfluent

Citi Tech Talk: Event Driven Kafka Microservicesconfluent

Confluent & GSI Webinars series - Session 3confluent

Citi Tech Talk: Messaging Modernizationconfluent

Citi Tech Talk: Data Governance for streaming and real time dataconfluent

Confluent & GSI Webinars series: Session 2confluent

Data In Motion Paris 2023confluent

Confluent Partner Tech Talk with Synthesisconfluent

The Future of Application Development - API Days - Melbourne 2023confluent

The Playful Bond Between REST And Data Streamsconfluent

The Journey to Data Mesh with Confluentconfluent

Plus de confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...

Santander Stream Processing with Apache Flink

Unlocking the Power of IoT: A comprehensive approach to real-time insights

Workshop híbrido: Stream Processing con Flink

Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...

AWS Immersion Day Mapfre - Confluent

Eventos y Microservicios - Santander TechTalk

Q&A with Confluent Experts: Navigating Networking in Confluent Cloud

Build real-time streaming data pipelines to AWS with Confluent

Q&A with Confluent Professional Services: Confluent Service Mesh

Citi Tech Talk: Event Driven Kafka Microservices

Confluent & GSI Webinars series - Session 3

Citi Tech Talk: Messaging Modernization

Citi Tech Talk: Data Governance for streaming and real time data

Confluent & GSI Webinars series: Session 2

Data In Motion Paris 2023

Confluent Partner Tech Talk with Synthesis

The Future of Application Development - API Days - Melbourne 2023

The Playful Bond Between REST And Data Streams

The Journey to Data Mesh with Confluent

Dernier

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Training state-of-the-art general text embeddingZilliz

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

Story boards and shot lists for my a level piececharlottematthew16

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

"ML in Production",Oleksandr BaganFwdays

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Commit 2024 - Secret Management made easyAlfredo García Lavilla

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Install Stable Diffusion in windows machinePadma Pradeep

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Search Engine Optimization SEO PDF for 2024.pdfRankYa

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Dernier (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Training state-of-the-art general text embedding

Designing IA for AI - Information Architecture Conference 2024

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

Story boards and shot lists for my a level piece

Ensuring Technical Readiness For Copilot in Microsoft 365

Streamlining Python Development: A Guide to a Modern Project Setup

"ML in Production",Oleksandr Bagan

DMCC Future of Trade Web3 - Special Edition

Commit 2024 - Secret Management made easy

Advanced Test Driven-Development @ php[tek] 2024

Dev Dives: Streamline document processing with UiPath Studio Web

Gen AI in Business - Global Trends Report 2024.pdf

Install Stable Diffusion in windows machine

Connect Wave/ connectwave Pitch Deck Presentation

Search Engine Optimization SEO PDF for 2024.pdf

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Unleash Your Potential - Namagunga Girls Coding Club

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

A Deep Dive into Kafka Controller

1. A Deep Dive into Kafka Controller Jun Rao VP of Apache Kafka Co-founder of Confluent

2. Kafka Replication • Configurable replication factor • Tolerating f – 1 failures with f replicas • Automated failover topic1-part1 logs broker 1 topic1-part2 logs broker 2 topic2-part2 topic2-part1 logs broker 3 topic1-part1 logs broker 4 topic1-part2 topic2-part2 topic1-part1 topic1-part2 topic2-part1 topic2-part2 topic2-part1

3. High Level Data Flow in Replication broker 1 producer leader broker 2 follower broker 3 follower 4 2 2 3 commit ack topic1-part1 topic1-part1 topic1-part1 consumer 1

4. What’s controller 4 • One broker in a cluster acts as controller • Monitor the liveness of brokers • Elect new leaders on broker failure • Communicate new leaders to brokers

5. Controller election Zookeeper /controller  broker 0 Controller broker 0 broker 3broker 2broker 1

6. Partition state: stored in ZK, cached in controller Zookeeper /topic/t1/0  leader:1 /topic/t1/1  leader:3 … /topic/t1/9  leader:2 Controller broker 0 broker 3broker 2broker 1

7. Controlled shutdown SIG_TERM Zookeeper Controller 1 2 broker 2 part t-0: follower part t-1: follower broker 1 part t-0: leader part t-1: leader broker 0 Zookeeper Controller 3 5 broker 2 part t-0: leader part t-1: leader broker 1 part t-0: part t-1: broker 0 4 /topics/t/0  2 /topics/t/1  2

8. Issues with controlled shutdown (pre 1.1) Zookeeper Controller 3 5 broker 0 4 Writes to ZK are serial Impact: longer shutdown time Communication of new leaders not batched Impact: client timeout broker 2 part t-0: leader part t-1: leader broker 1 part t-0: part t-1: /topics/t/0  2 /topics/t/1  2

9. Controller failover Zookeeper /controller  broker 0 Controller broker 0 broker 3broker 2broker 1 1

10. Controller failover Controller broker 0 broker 3broker 2broker 1 1 2 Controller Zookeeper /controller  broker 2

11. Controller failover Zookeeper /controller  broker 2 /topic/t1/0  leader:1 /topic/t1/1  leader:3 … /topic/t1/9  leader:2 Controller broker 0 broker 3broker 2broker 1 1 2 3 Controller

12. Issues with controller failover (pre 1.1) Controller broker 0 broker 3broker 2broker 1 1 2 3 Controller Reads from ZK are serial Impact: availability Zombie old controller Impact: inconsistency Zookeeper /controller  broker 2 /topic/t1/0  leader:1 /topic/t1/1  leader:3 … /topic/t1/9  leader:2

13. Performance improvements in 1.1 13 • Controller uses async ZK api for reads/writes • Controller communicates new leaders to brokers in batches part 1 part 2 part 3 part 4 part 1 part 2 part 3 part 4 Old (serial): New (pipelined):

14. /topics/t/0  2 /topics/t/1  2 Controlled shutdown (post 1.1) Zookeeper Controller 3 5 broker 0 4 Writes to ZK pipelined Communication of new leaders batched broker 2 part t-0: leader part t-1: leader broker 1 part t-0: part t-1:

15. Controller failover (post 1.1) Controller broker 0 broker 3broker 2broker 1 1 2 3 Controller Reads from ZK pipelined Zookeeper /controller  broker 2 /topic/t1/0  leader:1 /topic/t1/1  leader:3 … /topic/t1/9  leader:2

16. Results for controlled shutdown 16 • 5 ZK nodes and 5 brokers on different racks • 25K topics, 1 partition, 2 replicas • 10K partitions per broker Kafka 1.0.0 Kafka 1.1.0 Controlled shutdown time 6.5 minutes 3 seconds

17. Results for controller failover 17 • 5 ZK nodes and 5 brokers on different racks • 2K topics, 50 partitions, 1 replica • Controller failover: reload100K partitions from ZK Kafka 1.0.0 Kafka 1.1.0 State reload time 28 seconds 14 seconds

18. Fencing zombie requests from controller 18 • Zombie controller • ZK session expiration: better handling in the controller (1.1) • Controller path deletion: writes to ZK conditioned on controller epoch (2.1) • Zombie request from broker restart • Broker epoch (KIP-380 in 2.2) • Also fixed the missing ZK watcher issue

19. Summary • Significant performance improvement in controller in 1.1 • Allow 10X more partitions in a Kafka cluster • Better fencing of zombie requests from controller (1.1, 2.1, 2.2) • More details and remaining work in KAFKA-5027

20. Q/A • Acknowledgment: Onur Karaman, Manikumar Reddy, Prasanna Gautam, Ismael Juma, Mickael Maison, Sandor Murakozi, Rajini Sivaram,Ted Yu, Zhanxiang Huang

Notes de l'éditeur

Leaders spread over brokers. Another picture. Animated transitioin.

A Deep Dive into Kafka Controller

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à A Deep Dive into Kafka Controller

Similaire à A Deep Dive into Kafka Controller (20)

Plus de confluent

Plus de confluent (20)

Dernier

Dernier (20)

A Deep Dive into Kafka Controller

Notes de l'éditeur