Publicité
Publicité

Contenu connexe

Présentations pour vous(20)

Publicité

Thoughts on kafka capacity planning

  1. Thoughts on Kafka Capacity Planning Jamie Alquiza Sr. Software Engineer
  2. – Multi-petabyte footprint – 10s of GB/s in sustained bandwidth – Globally distributed infrastructure – Continuous growth A Lot of Kafka
  3. Motivation
  4. Poor utilization at scale is $$$🔥 1
  5. Poor utilization at scale is $$$🔥 1 2 Desire for predictable performance ⚡
  6. Kafka Resource Consumption
  7. CPU - Message Rate - Compression - Compaction (if used) Kafka Resource Consumption Memory Disk Network
  8. CPU - Message Rate - Compression - Compaction (if used) Kafka Resource Consumption Memory - Efficient, Steady Heap Usage - Page Cache Disk Network
  9. CPU - Message Rate - Compression - Compaction (if used) Kafka Resource Consumption Memory - Efficient, Steady Heap Usage - Page Cache Disk - Bandwidth - Storage Capacity Network
  10. CPU - Message Rate - Compression - Compaction (if used) Kafka Resource Consumption Memory - Efficient, Steady Heap Usage - Page Cache Disk - Bandwidth - Storage Capacity Network - Consumers - Replication
  11. Kafka Makes Capacity Planning Easy
  12. Through the lens of the Universal Scalability Law: - low contention, crosstalk - no complex queries Kafka Makes Capacity Planning Easy
  13. Through the lens of the Universal Scalability Law: - low contention, crosstalk - no complex queries Exposes mostly bandwidth problems: - highly sequential, batched ops - primary workload is streaming the reads/writes of bytes Kafka Makes Capacity Planning Easy
  14. Kafka Makes Capacity Planning Hard
  15. The default tools weren’t made for scaling: - reassign-partitions focused on simple partition placement Kafka Makes Capacity Planning Hard
  16. The default tools weren’t made for scaling: - reassign-partitions focused on simple partition placement No administrative API: - no endpoint to inspect or manipulate resources Kafka Makes Capacity Planning Hard
  17. A Scaling Model
  18. Created Kafka-Kit (open-source): - topicmappr for intelligent partition placement - registry (WIP): a Kafka gRPC/HTTP API A Scaling Model
  19. Created Kafka-Kit (open-source): - topicmappr for intelligent partition placement - registry (WIP): a Kafka gRPC/HTTP API Defined a simple workload pattern: - topics are bound to specific broker sets (“pools”) - multiple pools/cluster - primary drivers: disk capacity & network bandwidth A Scaling Model
  20. A Scaling Model – topicmappr builds optimal partition -> broker pool mappings
  21. A Scaling Model – topicmappr builds optimal partition -> broker pool mappings – topic/pool sets are scaled individually
  22. A Scaling Model – topicmappr builds optimal partition -> broker pool mappings – topic/pool sets are scaled individually – topicmappr handles repairs, storage rebalancing, pool expansion
  23. A large cluster is composed of - dozens of pools - hundreds of brokers
  24. Sizing Pools
  25. Sizing Pools - Storage utilization is targeted at 60-80% depending on topic growth rate
  26. Sizing Pools - Storage utilization is targeted at 60-80% depending on topic growth rate - Network capacity depends on several factors
  27. Sizing Pools - Storage utilization is targeted at 60-80% depending on topic growth rate - Network capacity depends on several factors - consumer demand + - MTTR targets (20-40% headroom for replication)
  28. Sizing Pools Determining broker counts
  29. Sizing Pools Determining broker counts storage: n = fullRetention / (storagePerNode * 0.8)
  30. Sizing Pools Determining broker counts storage: n = fullRetention / (storagePerNode * 0.8) network: n = consumerDemand / (bwPerNode * 0.6)
  31. Sizing Pools Determining broker counts storage: n = fullRetention / (storagePerNode * 0.8) network: n = consumerDemand / (bwPerNode * 0.6) pool size = max(ceil(storage), ceil(network))
  32. Sizing Pools (we do a pretty good job at actually hitting this)
  33. Instance Types
  34. Instance Types - If there’s a huge delta between counts required for network vs storage: probably the wrong type
  35. Instance Types - If there’s a huge delta between counts required for network vs storage: probably the wrong type - Remember: sequential, bandwidth-bound workloads
  36. Instance Types - If there’s a huge delta between counts required for network vs storage: probably the wrong type - Remember: sequential, bandwidth-bound workloads - AWS: d2, i3, h1 class
  37. Instance Types (AWS) d2: the spinning rust is actually great Good for: - Storage/$ - Retention biased workloads Problems: - Disk bw far exceeds network - Long MTTRs
  38. Instance Types (AWS) h1: a modernized d2? Good for: - Storage/$ - Balanced, lower retention / high throughput workloads Problems: - ENA network exceeds disk throughput - Recovery times are disk-bound - Disk bw / node < d2
  39. Instance Types (AWS) i3: bandwidth monster Good for: - Low MTTRs - Concurrent i/o outside the page cache Problems: - storage/$
  40. Instance Types (AWS) - r4, c5, etc + gp2 EBS It actually works well; EBS perf isn’t a problem
  41. Instance Types (AWS) - r4, c5, etc + gp2 EBS It actually works well; EBS perf isn’t a problem Problems: - low EBS channel bw in relation to instance size - the burden of running a distributed/replicated store, hinging it on tech that solves 2009 problems - may want to consider Kinesis / etc?
  42. Data Placement
  43. Data Placement topicmappr optimizes for: - maximum leadership distribution
  44. Data Placement topicmappr optimizes for: - maximum leadership distribution - replica rack.id isolation
  45. Data Placement topicmappr optimizes for: - maximum leadership distribution - replica rack.id isolation - maximum replica list entropy
  46. Data Placement Maximum replica list entropy(?) “For all partitions that a given broker holds, ensuring that the partition replicas are distributed among as many other unique brokers as possible”
  47. Data Placement Maximum replica list entropy It’s possible to have maximal partition distribution but a low number of unique broker-to-broker relationships
  48. Data Placement Maximum replica list entropy It’s possible to have maximal partition distribution but a low number of unique broker-to-broker relationships Example: broker A holds 20 partitions, all 20 replica sets contain only 3 other brokers
  49. Data Placement Maximum replica list entropy! - topicmappr expresses this as node degree distribution
  50. Data Placement Maximum replica list entropy! - topicmappr expresses this as node degree distribution - broker-to-broker relationships: it’s a graph
  51. Data Placement Maximum replica list entropy! - topicmappr expresses this as node degree distribution - broker-to-broker relationships: it’s a graph - replica sets are partial adjacency lists
  52. Data Placement A graph of replicas Given the following partition replica sets: p0: [1, 2, 3] p1: [ 2, 3, 4]
  53. Data Placement A graph of replicas Given the following partition replica sets: p0: [1, 2, 3] p1: [ 2, 3, 4] Broker 3’s adjacency list -> [1, 2, 4]
  54. Data Placement A graph of replicas Given the following partition replica sets: p0: [1, 2, 3] p1: [ 2, 3, 4] Broker 3’s adjacency list -> [1, 2, 4] (degree = 3)
  55. Data Placement Maximizing replica list entropy (is good)
  56. Data Placement Maximizing replica list entropy (is good) In broker failure/replacements: - probabilistically increases replication sources - faster, lower impact recoveries
  57. Data Placement topicmappr optimizes for: - maximum leadership distribution ✅ - replica rack.id isolation ✅ - maximum replica list entropy ✅
  58. Maintaining Pools
  59. Maintaining Pools Most common tasks: - ensuring storage balance - simple broker replacements
  60. Maintaining Pools Most common tasks: - ensuring storage balance - simple broker replacements Both of these are (also) done with topicmappr
  61. Maintaining Pools Broker storage balance
  62. Maintaining Pools Broker storage balance - finds offload candidates: n distance below harmonic mean storage free
  63. Maintaining Pools Broker storage balance
  64. Maintaining Pools Broker storage balance
  65. Maintaining Pools Broker storage balance - finds offload candidates: n distance below harmonic mean storage free - plans relocations to least-utilized brokers
  66. Maintaining Pools Broker storage balance - finds offload candidates: n distance below harmonic mean storage free - plans relocations to least-utilized brokers - fair-share, first-fit descending bin-packing
  67. Maintaining Pools Broker storage balance - finds offload candidates: n distance below harmonic mean storage free - plans relocations to least-utilized brokers - fair-share, first-fit descending bin-packing - loops until no more relocations can be planned
  68. Maintaining Pools Broker storage balance (results)
  69. Maintaining Pools Broker replacements
  70. Maintaining Pools Broker replacements - When a single broker fails, how is a replacement chosen?
  71. Maintaining Pools Broker replacements - When a single broker fails, how is a replacement chosen? - Goal: retain any previously computed storage balance (via 1:1 replacements)
  72. Maintaining Pools Broker replacements - When a single broker fails, how is a replacement chosen? - Goal: retain any previously computed storage balance (via 1:1 replacements) - Problem: dead brokers no longer visible in ZK
  73. Maintaining Pools Broker replacements - topicmappr can be provided several hot spares from varying AZs (rack.id)
  74. Maintaining Pools Broker replacements - topicmappr can be provided several hot spares from varying AZs (rack.id) - infers a suitable replacement (“substitution affinity” feature)
  75. Maintaining Pools Broker replacements - inferring replacements - traverse all ISRs, build a set of all rack.ids: G = {1a,1b,1c,1d}
  76. Maintaining Pools Broker replacements - inferring replacements - traverse all ISRs, build a set of all rack.ids: G = {1a,1b,1c,1d} - traverse affected ISRs, build a set of live rack.ids: L = {1a,1c}
  77. Maintaining Pools Broker replacements - inferring replacements - Build a set of suitable rack.ids to choose from: S = { x ∈ G | x ∉ L }
  78. Maintaining Pools Broker replacements - inferring replacements - Build a set of suitable rack.ids to choose from: S = { x ∈ G | x ∉ L } - S = {1b,1d} - automatically chooses a hot spare from 1b or 1d
  79. Maintaining Pools Broker replacements - inferring replacements Outcome: - Keeps brokers bound to specific pools - Simple repairs that maintain storage balance, high utilization
  80. Scaling Pools
  81. Scaling Pools When: >90% storage utilization in 48h
  82. Scaling Pools How: - add brokers to pool - run a rebalance - autothrottle takes over
  83. autothrottle is a service that dynamically manages replication rates
  84. Scaling Pools Increasing capacity also improves storage balance
  85. What’s Next
  86. What’s Next - precursor to fully automated capacity management
  87. What’s Next - precursor to fully automated capacity management - continued growth, dozens more clusters
  88. What’s Next - precursor to fully automated capacity management - continued growth, dozens more clusters - new infrastructure
  89. What’s Next - precursor to fully automated capacity management - continued growth, dozens more clusters - new infrastructure - (we’re hiring)
  90. Thank you Jamie Alquiza Sr. Software Engineer twitter.com/jamiealquiza
Publicité