Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

A Deep Dive into Kafka Controller

6 553 vues

Publié le

Presentation at Strata Data Conference 2018, New York

The controller is the brain of Apache Kafka. A big part of what the controller does is to maintain the consistency of the replicas and determine which replica can be used to serve the clients, especially during individual broker failure.
Jun Rao outlines the main data flow in the controller—in particular, when a broker fails, how the controller automatically promotes another replica as the leader to serve the clients, and when a broker is started, how the controller resumes the replication pipeline in the restarted broker.

Jun then describes recent improvements to the controller that allow it to handle certain edge cases correctly and increase its performance, which allows for more partitions in a Kafka cluster.

Publié dans : Technologie
  • Identifiez-vous pour voir les commentaires

A Deep Dive into Kafka Controller

  1. 1. A Deep Dive into Kafka Controller Jun Rao VP of Apache Kafka Co-founder of Confluent
  2. 2. Apache Kafka overview • Core • Pub/sub • Connect • Integration • Streams • Processing
  3. 3. Kafka adoption in enterprises 6 of the top 10 travel companies 8 of the top 10 insurance companies 7 of the top 10 global banks 9 of the top 10 telecom companies
  4. 4. Kafka Replication • Configurable replication factor • Tolerating f – 1 failures with f replicas • Automated failover topic1-part1 logs broker 1 topic1-part2 logs broker 2 topic2-part2 topic2-part1 logs broker 3 topic1-part1 logs broker 4 topic1-part2 topic2-part2 topic1-part1 topic1-part2 topic2-part1 topic2-part2 topic2-part1
  5. 5. High Level Data Flow in Replication broker 1 producer leader broker 2 follower broker 3 follower 4 2 2 3 commit ack topic1-part1 topic1-part1 topic1-part1 consumer 1
  6. 6. What’s controller 6 • One broker in a cluster acts as controller • Monitor the liveness of brokers • Elect new leaders on broker failure • Communicate new leaders to brokers
  7. 7. Controller election Zookeeper /controller à broker 0 Controller broker 0 broker 3broker 2broker 1
  8. 8. Partition state: stored in ZK, cached in controller Zookeeper /topic/t1/0 à leader:1 /topic/t1/1 à leader:3 … /topic/t1/9 à leader:2 Controller broker 0 broker 3broker 2broker 1
  9. 9. Controlled shutdown SIG_TERM Zookeeper Controller 1 2 broker 2 part t-0: follower part t-1: follower broker 1 part t-0: leader part t-1: leader broker 0 Zookeeper Controller 3 5 broker 2 part t-0: leader part t-1: leader broker 1 part t-0: part t-1: broker 0 4 /topics/t/0 à 2 /topics/t/1 à 2
  10. 10. Issues with controlled shutdown (pre 1.1) Zookeeper Controller 3 5 broker 0 4 Writes to ZK are serial Impact: longer shutdown time Communication of new leaders not batched Impact: client timeout broker 2 part t-0: leader part t-1: leader broker 1 part t-0: part t-1: /topics/t/0 à 2 /topics/t/1 à 2
  11. 11. Controller failover Zookeeper /controller à broker 0 Controller broker 0 broker 3broker 2broker 1 1
  12. 12. Controller failover Controller broker 0 broker 3broker 2broker 1 1 2 Controller Zookeeper /controller à broker 2
  13. 13. Controller failover Zookeeper /controller à broker 2 /topic/t1/0 à leader:1 /topic/t1/1 à leader:3 … /topic/t1/9 à leader:2 Controller broker 0 broker 3broker 2broker 1 1 2 3 Controller
  14. 14. Issues with controller failover (pre 1.1) Controller broker 0 broker 3broker 2broker 1 1 2 3 Controller Reads from ZK are serial Impact: availability Zombie old controller Impact: inconsistency Zookeeper /controller à broker 2 /topic/t1/0 à leader:1 /topic/t1/1 à leader:3 … /topic/t1/9 à leader:2
  15. 15. Performance improvements in 1.1 15 • Controller uses async ZK api for reads/writes • Controller communicates new leaders to brokers in batches part 1 part 2 part 3 part 4 part 1 part 2 part 3 part 4 Old (serial): New (pipelined):
  16. 16. /topics/t/0 à 2 /topics/t/1 à 2 Controlled shutdown (post 1.1) Zookeeper Controller 3 5 broker 0 4 Writes to ZK pipelined Communication of new leaders batched broker 2 part t-0: leader part t-1: leader broker 1 part t-0: part t-1:
  17. 17. Controller failover (post 1.1) Controller broker 0 broker 3broker 2broker 1 1 2 3 Controller Reads from ZK pipelined Zookeeper /controller à broker 2 /topic/t1/0 à leader:1 /topic/t1/1 à leader:3 … /topic/t1/9 à leader:2
  18. 18. Results for controlled shutdown 18 • 5 ZK nodes and 5 brokers on different racks • 25K topics, 1 partition, 2 replicas • 10K partitions per broker Kafka 1.0.0 Kafka 1.1.0 Controlled shutdown time 6.5 minutes 3 seconds
  19. 19. Results for controller failover 19 • 5 ZK nodes and 5 brokers on different racks • 2K topics, 50 partitions, 1 replica • Controller failover: reload100K partitions from ZK Kafka 1.0.0 Kafka 1.1.0 State reload time 28 seconds 14 seconds
  20. 20. Fencing zombie controller 20 • ZK session expiration • Better handling in the controller (1.1) • Controller path deletion • Writes to ZK conditioned on controller epoch (to be in 2.1)
  21. 21. Controller failover (expected in 2.1) Controller broker 0 broker 3broker 2broker 1 1 2 Controller Zombie old controller fenced Zookeeper /controller à broker 2
  22. 22. Summary • Significant performance improvement in controller in 1.1 • Allow 10X more partitions in a Kafka cluster • Better fencing of zombie controller in 1.1 and 2.1 • More details in KAFKA-5027
  23. 23. Future work in controller • Further improvement on controller failover • Standby controller • Better handling of quick broker restart (KAFKA-1120) • Broker generation
  24. 24. Q/A • Acknowledgment: Onur Karaman, Manikumar Reddy, Prasanna Gautam, Ismael Juma, Mickael Maison, Sandor Murakozi, Rajini Sivaram,Ted Yu, Zhanxiang Huang • Apache Kafka: http://kafka.apache.org/ • Confluent: http://confluent.io/

×