This talk covers Kafka cluster sizing, instance type selections, scaling operations, replication throttling and more. Don’t forget to check out the Kafka-Kit repository.
https://www.youtube.com/watch?time_continue=2613&v=7uN-Vlf7W5E
Through the lens of the Universal
Scalability Law:
- low contention, crosstalk
- no complex queries
Kafka Makes Capacity Planning Easy
Through the lens of the Universal
Scalability Law:
- low contention, crosstalk
- no complex queries
Exposes mostly bandwidth
problems:
- highly sequential, batched ops
- primary workload is streaming
the reads/writes of bytes
Kafka Makes Capacity Planning Easy
The default tools weren’t made for
scaling:
- reassign-partitions focused on
simple partition placement
Kafka Makes Capacity Planning Hard
The default tools weren’t made for
scaling:
- reassign-partitions focused on
simple partition placement
No administrative API:
- no endpoint to inspect or
manipulate resources
Kafka Makes Capacity Planning Hard
Created Kafka-Kit (open-source):
- topicmappr for intelligent
partition placement
- registry (WIP): a Kafka
gRPC/HTTP API
Defined a simple workload pattern:
- topics are bound to specific
broker sets (“pools”)
- multiple pools/cluster
- primary drivers: disk capacity &
network bandwidth
A Scaling Model
A Scaling Model
– topicmappr builds optimal
partition -> broker pool
mappings
A Scaling Model
– topicmappr builds optimal
partition -> broker pool
mappings
– topic/pool sets are scaled
individually
A Scaling Model
– topicmappr builds optimal
partition -> broker pool
mappings
– topic/pool sets are scaled
individually
– topicmappr handles repairs,
storage rebalancing, pool
expansion
A large cluster is
composed of
- dozens of pools
- hundreds of brokers
Instance Types
- If there’s a huge delta between counts required for
network vs storage: probably the wrong type
Instance Types
- If there’s a huge delta between counts required for
network vs storage: probably the wrong type
- Remember: sequential, bandwidth-bound
workloads
Instance Types
- If there’s a huge delta between counts required for
network vs storage: probably the wrong type
- Remember: sequential, bandwidth-bound
workloads
- AWS: d2, i3, h1 class
Instance Types (AWS)
d2: the spinning rust is actually great
Good for:
- Storage/$
- Retention biased workloads
Problems:
- Disk bw far exceeds network
- Long MTTRs
Instance Types (AWS)
h1: a modernized d2?
Good for:
- Storage/$
- Balanced, lower retention / high throughput workloads
Problems:
- ENA network exceeds disk throughput
- Recovery times are disk-bound
- Disk bw / node < d2
Instance Types (AWS)
- r4, c5, etc + gp2 EBS
It actually works well; EBS perf isn’t a problem
Instance Types (AWS)
- r4, c5, etc + gp2 EBS
It actually works well; EBS perf isn’t a problem
Problems:
- low EBS channel bw in relation to instance size
- the burden of running a distributed/replicated store, hinging it
on tech that solves 2009 problems
- may want to consider Kinesis / etc?
Data Placement
Maximum replica list entropy(?)
“For all partitions that a given broker holds,
ensuring that the partition replicas are distributed
among as many other unique brokers as possible”
Data Placement
Maximum replica list entropy
It’s possible to have maximal partition distribution
but a low number of unique broker-to-broker
relationships
Data Placement
Maximum replica list entropy
It’s possible to have maximal partition distribution
but a low number of unique broker-to-broker
relationships
Example: broker A holds 20 partitions, all 20 replica sets contain
only 3 other brokers
Data Placement
Maximum replica list entropy!
- topicmappr expresses this as node degree
distribution
- broker-to-broker relationships: it’s a graph
Data Placement
Maximum replica list entropy!
- topicmappr expresses this as node degree
distribution
- broker-to-broker relationships: it’s a graph
- replica sets are partial adjacency lists
Data Placement
A graph of replicas
Given the following partition replica sets:
p0: [1, 2, 3]
p1: [ 2, 3, 4]
Data Placement
A graph of replicas
Given the following partition replica sets:
p0: [1, 2, 3]
p1: [ 2, 3, 4]
Broker 3’s adjacency list -> [1, 2, 4]
Data Placement
A graph of replicas
Given the following partition replica sets:
p0: [1, 2, 3]
p1: [ 2, 3, 4]
Broker 3’s adjacency list -> [1, 2, 4] (degree = 3)
Maintaining Pools
Broker storage balance
- finds offload candidates: n distance below
harmonic mean storage free
- plans relocations to least-utilized brokers
Maintaining Pools
Broker storage balance
- finds offload candidates: n distance below
harmonic mean storage free
- plans relocations to least-utilized brokers
- fair-share, first-fit descending bin-packing
Maintaining Pools
Broker storage balance
- finds offload candidates: n distance below
harmonic mean storage free
- plans relocations to least-utilized brokers
- fair-share, first-fit descending bin-packing
- loops until no more relocations can be planned
Maintaining Pools
Broker replacements
- When a single broker fails, how is a replacement
chosen?
- Goal: retain any previously computed storage
balance (via 1:1 replacements)
Maintaining Pools
Broker replacements
- When a single broker fails, how is a replacement
chosen?
- Goal: retain any previously computed storage
balance (via 1:1 replacements)
- Problem: dead brokers no longer visible in ZK
Maintaining Pools
Broker replacements
- topicmappr can be provided several hot spares
from varying AZs (rack.id)
- infers a suitable replacement (“substitution affinity”
feature)
Maintaining Pools
Broker replacements - inferring replacements
- traverse all ISRs, build a set of all rack.ids:
G = {1a,1b,1c,1d}
- traverse affected ISRs, build a set of live rack.ids:
L = {1a,1c}
Maintaining Pools
Broker replacements - inferring replacements
- Build a set of suitable rack.ids to choose from:
S = { x ∈ G | x ∉ L }
- S = {1b,1d}
- automatically chooses a hot spare from 1b or 1d
Maintaining Pools
Broker replacements - inferring replacements
Outcome:
- Keeps brokers bound to specific pools
- Simple repairs that maintain storage balance, high
utilization