SlideShare une entreprise Scribd logo
1  sur  65
Télécharger pour lire hors ligne
Introduction to Kafka Cruise Control
Viktor Somogyi-Vass
Viktor Somogyi-Vass
viktor.somogyi@cloudera.com
3
© 2023 Cloudera, Inc. All rights reserved.
Operating Kafka clusters are hard.
4
© 2023 Cloudera, Inc. All rights reserved.
5
© 2023 Cloudera, Inc. All rights reserved.
1
Surprises
There can be data center
outages, maintenance can
go wrong and upgrades can
go wrong.
6
© 2023 Cloudera, Inc. All rights reserved.
1
Surprises
There can be data center
outages, maintenance can
go wrong and upgrades can
go wrong.
2
Scaling
New brokers won’t get
populated. What to put on
them? Where to put
partitions when
decommissioning brokers?
7
© 2023 Cloudera, Inc. All rights reserved.
1
Surprises
There can be data center
outages, maintenance can
go wrong and upgrades can
go wrong.
3
Performance
Brokers can be overloaded
or even underloaded.
Rebalances can ruin your
cluster.
2
Scaling
New brokers won’t get
populated. What to put on
them? Where to put
partitions when
decommissioning brokers?
8
© 2023 Cloudera, Inc. All rights reserved.
How do we solve these problems?
9
© 2023 Cloudera, Inc. All rights reserved.
10
© 2023 Cloudera, Inc. All rights reserved.
SOME HIGHLIGHTS
11
© 2023 Cloudera, Inc. All rights reserved.
SOME HIGHLIGHTS
SELF-HEALING
By rebalances
Detect—failures and anomalies are
detected automatically
Rebalance—healing will be done via
rebalancing partitions to healthy
brokers
12
© 2023 Cloudera, Inc. All rights reserved.
SOME HIGHLIGHTS
SELF-HEALING
By rebalances
ADMIN
Scaling and maintenance
Detect—failures and anomalies are
detected automatically
Rebalance—healing will be done via
rebalancing partitions to healthy
brokers
Upscaling—brokers can be added to
Cruise Control and populated with
partitions
Downscaling—brokers can be
removed from the cluster and
partitions will be reassigned
optimally
13
© 2023 Cloudera, Inc. All rights reserved.
SOME HIGHLIGHTS
SELF-HEALING
By rebalances
ADMIN
Scaling and maintenance
MONITORING
Even load everywhere
Detect—failures and anomalies are
detected automatically
Rebalance—healing will be done via
rebalancing partitions to healthy
brokers
Upscaling—brokers can be added to
Cruise Control and populated with
partitions
Downscaling—brokers can be
removed from the cluster and
partitions will be reassigned
optimally
Metrics—are collected periodically
to ensure an up-to-date view of the
cluster
Estimation—partitions level
utilization is estimated with a linear
model
14
© 2023 Cloudera, Inc. All rights reserved.
How does it work?
15
© 2023 Cloudera, Inc. All rights reserved.
ARCHITECTURE
16
© 2023 Cloudera, Inc. All rights reserved.
ARCHITECTURE
17
© 2023 Cloudera, Inc. All rights reserved.
ARCHITECTURE
18
© 2023 Cloudera, Inc. All rights reserved.
ARCHITECTURE
19
© 2023 Cloudera, Inc. All rights reserved.
ARCHITECTURE
20
© 2023 Cloudera, Inc. All rights reserved.
ARCHITECTURE
21
© 2023 Cloudera, Inc. All rights reserved.
THE LOAD MONITOR
22
© 2023 Cloudera, Inc. All rights reserved.
THE LOAD MONITOR
23
© 2023 Cloudera, Inc. All rights reserved.
THE LOAD MONITOR
24
© 2023 Cloudera, Inc. All rights reserved.
THE LOAD MONITOR
25
© 2023 Cloudera, Inc. All rights reserved.
THE LOAD MONITOR
26
© 2023 Cloudera, Inc. All rights reserved.
THE LOAD MONITOR
27
© 2023 Cloudera, Inc. All rights reserved.
THE ANALYZER
The brain
28
© 2023 Cloudera, Inc. All rights reserved.
THE ANALYZER
The brain
Goals
Define the
characteristics of an
optimal aspect
29
© 2023 Cloudera, Inc. All rights reserved.
THE ANALYZER
The brain
Goals
Define the
characteristics of an
optimal aspect
Heuristic model
An iterative model
defines how it
reaches the optimal
load
30
© 2023 Cloudera, Inc. All rights reserved.
THE ANALYZER
The brain
For each goal g in the goal list ordered by priority {
For each broker b {
while b does not meet g’s requirement {
For each replica r on b sorted by the resource utilization density {
Move r (or the leadership of r) to another eligible broker b’ so b’
still satisfies g and all the satisfied goals
Finish the optimization for b once g is satisfied.
}
Fail the optimization if g is a hard goal and is not satisfied for b
}
}
Add g to the satisfied goals
}
Goals
Define the
characteristics of an
optimal aspect
Heuristic model
An iterative model
defines how it
reaches the optimal
load
31
© 2023 Cloudera, Inc. All rights reserved.
THE ANOMALY DETECTOR
32
© 2023 Cloudera, Inc. All rights reserved.
THE ANOMALY DETECTOR
Broker failure
A non-empty broker
crashes. Results in
under-replication.
33
© 2023 Cloudera, Inc. All rights reserved.
THE ANOMALY DETECTOR
Disk failure
A non-empty disk
dies. Some replicas
might get lost.
Broker failure
A non-empty broker
crashes. Results in
under-replication.
34
© 2023 Cloudera, Inc. All rights reserved.
THE ANOMALY DETECTOR
Goal violation
An optimization goal
is violated.
Unbalanced cluster.
Disk failure
A non-empty disk
dies. Some replicas
might get lost.
Broker failure
A non-empty broker
crashes. Results in
under-replication.
35
© 2023 Cloudera, Inc. All rights reserved.
THE ANOMALY DETECTOR
Goal violation
An optimization goal
is violated.
Unbalanced cluster.
Disk failure
A non-empty disk
dies. Some replicas
might get lost.
Broker failure
A non-empty broker
crashes. Results in
under-replication.
Fixable
36
© 2023 Cloudera, Inc. All rights reserved.
THE ANOMALY DETECTOR
Goal violation
An optimization goal
is violated.
Unbalanced cluster.
Disk failure
A non-empty disk
dies. Some replicas
might get lost.
Metric anomaly
One of the collected
metric observes an
unexpected value.
Broker failure
A non-empty broker
crashes. Results in
under-replication.
Fixable
37
© 2023 Cloudera, Inc. All rights reserved.
THE ANOMALY DETECTOR
Goal violation
An optimization goal
is violated.
Unbalanced cluster.
Disk failure
A non-empty disk
dies. Some replicas
might get lost.
Metric anomaly
One of the collected
metric observes an
unexpected value.
Topic anomaly
One or more topics
violate user-defined
properties.
Broker failure
A non-empty broker
crashes. Results in
under-replication.
Fixable
38
© 2023 Cloudera, Inc. All rights reserved.
THE ANOMALY DETECTOR
Goal violation
An optimization goal
is violated.
Unbalanced cluster.
Disk failure
A non-empty disk
dies. Some replicas
might get lost.
Metric anomaly
One of the collected
metric observes an
unexpected value.
Topic anomaly
One or more topics
violate user-defined
properties.
Broker failure
A non-empty broker
crashes. Results in
under-replication.
Fixable Not
fixable
39
© 2023 Cloudera, Inc. All rights reserved.
EXECUTOR
40
© 2023 Cloudera, Inc. All rights reserved.
EXECUTOR
Long running
Depending on the size
of the executed task.
41
© 2023 Cloudera, Inc. All rights reserved.
EXECUTOR
Interruptible
At any point it can be
stopped safely. No
“undo” functionality.
Long running
Depending on the size
of the executed task.
42
© 2023 Cloudera, Inc. All rights reserved.
EXECUTOR
Reassignment
Optimization is done
by reassigning
partitions between
disks, brokers and
adjusting leadership
Interruptible
At any point it can be
stopped safely. No
“undo” functionality.
Long running
Depending on the size
of the executed task.
43
© 2023 Cloudera, Inc. All rights reserved.
EXECUTOR
Reassignment
Optimization is done
by reassigning
partitions between
disks, brokers and
adjusting leadership
Concurrent
Reassignment is
concurrent, limits are
set either manually or
by the concurrency
adjuster
Interruptible
At any point it can be
stopped safely. No
“undo” functionality.
Long running
Depending on the size
of the executed task.
44
© 2023 Cloudera, Inc. All rights reserved.
So how do we use it?
46
© 2023 Cloudera, Inc. All rights reserved.
What else can it do?
47
© 2023 Cloudera, Inc. All rights reserved.
SLOW BROKER FINDER
48
© 2023 Cloudera, Inc. All rights reserved.
SLOW BROKER FINDER
DETECTION
With metrics
Log flush—broker log flush 99.9th
percentile is used, both absolute
and relative to incoming traffic
Scoring—brokers have score, every
anomaly increases it and every
normal behavior decreases it
49
© 2023 Cloudera, Inc. All rights reserved.
SLOW BROKER FINDER
DETECTION
With metrics
REMEDIATION
Demoting and rebalance
Log flush—broker log flush 99.9th
percentile is used, both absolute
and relative to incoming traffic
Scoring—brokers have score, every
anomaly increases it and every
normal behavior decreases it
Demoting—brokers will be demoted
as a first attempt
Removal—slow brokers can be
removed from the cluster if the
problem persists after demotion
50
© 2023 Cloudera, Inc. All rights reserved.
PROVISIONER
51
© 2023 Cloudera, Inc. All rights reserved.
PROVISIONER
RIGHTSIZING
With API
Underprovisioned—the cluster lacks
certain kind of resources
Overprovisioned—by removing
resources and rebalancing it can
achieve some cost saving
52
© 2023 Cloudera, Inc. All rights reserved.
PROVISIONER
RIGHTSIZING
With API
Underprovisioned—the cluster lacks
certain kind of resources
Overprovisioned—by removing
resources and rebalancing it can
achieve some cost saving
RECOMMENDATION
With goals
Resource—a goal can request a
resource, like broker, disk to be
added or removed
Constraints—every goal can give
constraints, e.g. racks for which
brokers should not be added
53
© 2023 Cloudera, Inc. All rights reserved.
PROVISIONER
IMPLEMENTATION
Provider dependent
API—there is an API that can be
used as a basis for your
implementation
Providers—for all providers like AWS
or Azure you need to add glue code
to acquire or release resources
RIGHTSIZING
With API
Underprovisioned—the cluster lacks
certain kind of resources
Overprovisioned—by removing
resources and rebalancing it can
achieve some cost saving
RECOMMENDATION
With goals
Resource—a goal can request a
resource, like broker, disk to be
added or removed
Constraints—every goal can give
constraints, e.g. racks for which
brokers should not be added
54
© 2023 Cloudera, Inc. All rights reserved.
CONCURRENCY ADJUSTER
Inter-broker replica reassignment limits aren’t easy
55
© 2023 Cloudera, Inc. All rights reserved.
CONCURRENCY ADJUSTER
MANUAL LIMITS
Challenging
Destructive—if the manually set
limit is too high, then it can be fast
but also overwhelm the cluster
Slow—If the limit is too small then
rebalances can last for a long time
Inter-broker replica reassignment limits aren’t easy
56
© 2023 Cloudera, Inc. All rights reserved.
CONCURRENCY ADJUSTER
MANUAL LIMITS
Challenging
Destructive—if the manually set
limit is too high, then it can be fast
but also overwhelm the cluster
Slow—If the limit is too small then
rebalances can last for a long time
FEEDBACK LOOP
Adaptive limit
Metrics—metrics are collected to
assess whether rebalance
concurrency needs adjustments
Limits—if metrics are over a
threshold then concurrency will
adjust with AIMD algorithm
Inter-broker replica reassignment limits aren’t easy
57
© 2023 Cloudera, Inc. All rights reserved.
CONCURRENCY ADJUSTER
RESULTS
AIMD
Automated—no need for constant
oversight for setting concurrency in
a fluctuating traffic environment
Fast—rebalances will complete
faster due to the optimal limit
Stable—clients won’t experience any
side effects caused by rebalances
MANUAL LIMITS
Challenging
Destructive—if the manually set
limit is too high, then it can be fast
but also overwhelm the cluster
Slow—If the limit is too small then
rebalances can last for a long time
FEEDBACK LOOP
Adaptive limit
Metrics—metrics are collected to
assess whether rebalance
concurrency needs adjustments
Limits—if metrics are over a
threshold then concurrency will
adjust with AIMD algorithm
Inter-broker replica reassignment limits aren’t easy
58
© 2023 Cloudera, Inc. All rights reserved.
Security matters
59
© 2023 Cloudera, Inc. All rights reserved.
SECURITY IN CRUISE CONTROL
60
© 2023 Cloudera, Inc. All rights reserved.
SECURITY IN CRUISE CONTROL
Basic Auth
The simple basic
authentication that
sends username and
password
61
© 2023 Cloudera, Inc. All rights reserved.
SECURITY IN CRUISE CONTROL
SPNEGO
Kerberos over HTTP.
Tokens are
negotiated via the
SPNEGO protocol.
Basic Auth
The simple basic
authentication that
sends username and
password
62
© 2023 Cloudera, Inc. All rights reserved.
SECURITY IN CRUISE CONTROL
SPNEGO
Kerberos over HTTP.
Tokens are
negotiated via the
SPNEGO protocol.
Basic Auth
The simple basic
authentication that
sends username and
password
Trusted Proxy
If a process acts on
behalf of a user, the
doAs header will
forward the
username.
63
© 2023 Cloudera, Inc. All rights reserved.
SECURITY IN CRUISE CONTROL
SPNEGO
Kerberos over HTTP.
Tokens are
negotiated via the
SPNEGO protocol.
Basic Auth
The simple basic
authentication that
sends username and
password
Trusted Proxy
If a process acts on
behalf of a user, the
doAs header will
forward the
username.
TLS/HTTPS
Communication is
secured via HTTPS to
the server and TLS to
Zookeeper or Kafka
64
© 2023 Cloudera, Inc. All rights reserved.
SECURITY IN CRUISE CONTROL
SPNEGO
Kerberos over HTTP.
Tokens are
negotiated via the
SPNEGO protocol.
Basic Auth
The simple basic
authentication that
sends username and
password
Trusted Proxy
If a process acts on
behalf of a user, the
doAs query param
will forward the
username.
TLS/HTTPS
Communication is
secured via HTTPS to
the server and TLS to
Zookeeper or Kafka
Authorization
Simple model with
viewer, user and
admin roles.
THANK YOU

Contenu connexe

Tendances

Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Lucas Jellema
 

Tendances (20)

Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Deploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and KubernetesDeploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and Kubernetes
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
 
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformKafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platform
 
Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyond
 
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 

Similaire à An Introduction to Kafka Cruise Control with Viktor Somogyi-Vass

MySQL High Availability: Managing Farms of Distributed Servers (MySQL Fabric)
MySQL High Availability: Managing Farms of Distributed Servers (MySQL Fabric)MySQL High Availability: Managing Farms of Distributed Servers (MySQL Fabric)
MySQL High Availability: Managing Farms of Distributed Servers (MySQL Fabric)
Alfranio Júnior
 
Mitigating Kafka Broker ‘Gray’ Failures For Key Based Partitioners With Parti...
Mitigating Kafka Broker ‘Gray’ Failures For Key Based Partitioners With Parti...Mitigating Kafka Broker ‘Gray’ Failures For Key Based Partitioners With Parti...
Mitigating Kafka Broker ‘Gray’ Failures For Key Based Partitioners With Parti...
HostedbyConfluent
 
Discover Aura Workshop (12.5.23).pdf
Discover Aura Workshop (12.5.23).pdfDiscover Aura Workshop (12.5.23).pdf
Discover Aura Workshop (12.5.23).pdf
Neo4j
 
Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edition
Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 EditionTakeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edition
Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edition
ThousandEyes
 

Similaire à An Introduction to Kafka Cruise Control with Viktor Somogyi-Vass (20)

Meet the Committers Webinar_ Lab Preparation
Meet the Committers Webinar_ Lab PreparationMeet the Committers Webinar_ Lab Preparation
Meet the Committers Webinar_ Lab Preparation
 
Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edition
Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 EditionTakeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edition
Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edition
 
Keep Your Kafka Cloud Costs in Check with Showbacks
Keep Your Kafka Cloud Costs in Check with ShowbacksKeep Your Kafka Cloud Costs in Check with Showbacks
Keep Your Kafka Cloud Costs in Check with Showbacks
 
Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
 
MySQL High Availability: Managing Farms of Distributed Servers (MySQL Fabric)
MySQL High Availability: Managing Farms of Distributed Servers (MySQL Fabric)MySQL High Availability: Managing Farms of Distributed Servers (MySQL Fabric)
MySQL High Availability: Managing Farms of Distributed Servers (MySQL Fabric)
 
Ccna sv2 instructor_ppt_ch3
Ccna sv2 instructor_ppt_ch3Ccna sv2 instructor_ppt_ch3
Ccna sv2 instructor_ppt_ch3
 
Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edit...
Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edit...Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edit...
Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edit...
 
Aerospike O&M Webinar.pptx
Aerospike O&M Webinar.pptxAerospike O&M Webinar.pptx
Aerospike O&M Webinar.pptx
 
002 Introducing Neo4j 5 for Administrators - NODES2022 AMERICAS Beginner 2 - ...
002 Introducing Neo4j 5 for Administrators - NODES2022 AMERICAS Beginner 2 - ...002 Introducing Neo4j 5 for Administrators - NODES2022 AMERICAS Beginner 2 - ...
002 Introducing Neo4j 5 for Administrators - NODES2022 AMERICAS Beginner 2 - ...
 
Making your PostgreSQL Database Highly Available
Making your PostgreSQL Database Highly AvailableMaking your PostgreSQL Database Highly Available
Making your PostgreSQL Database Highly Available
 
Discovering exoplanets with Deep Leaning
Discovering exoplanets with Deep LeaningDiscovering exoplanets with Deep Leaning
Discovering exoplanets with Deep Leaning
 
Mitigating Kafka Broker ‘Gray’ Failures For Key Based Partitioners With Parti...
Mitigating Kafka Broker ‘Gray’ Failures For Key Based Partitioners With Parti...Mitigating Kafka Broker ‘Gray’ Failures For Key Based Partitioners With Parti...
Mitigating Kafka Broker ‘Gray’ Failures For Key Based Partitioners With Parti...
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
 
Best Practices For Workflow
Best Practices For WorkflowBest Practices For Workflow
Best Practices For Workflow
 
Discover Aura Workshop (12.5.23).pdf
Discover Aura Workshop (12.5.23).pdfDiscover Aura Workshop (12.5.23).pdf
Discover Aura Workshop (12.5.23).pdf
 
Introduction to ThousandEyes
Introduction to ThousandEyesIntroduction to ThousandEyes
Introduction to ThousandEyes
 
New ThousandEyes Product Features and Release Highlights: October 2023
New ThousandEyes Product Features and Release Highlights: October 2023New ThousandEyes Product Features and Release Highlights: October 2023
New ThousandEyes Product Features and Release Highlights: October 2023
 
Deep Dive into MySQL InnoDB Cluster Read Scale-out Capabilities.pdf
Deep Dive into MySQL InnoDB Cluster Read Scale-out Capabilities.pdfDeep Dive into MySQL InnoDB Cluster Read Scale-out Capabilities.pdf
Deep Dive into MySQL InnoDB Cluster Read Scale-out Capabilities.pdf
 
Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edition
Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 EditionTakeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edition
Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edition
 

Plus de HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 

Plus de HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Dernier (20)

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

An Introduction to Kafka Cruise Control with Viktor Somogyi-Vass

  • 1. Introduction to Kafka Cruise Control Viktor Somogyi-Vass
  • 3. 3 © 2023 Cloudera, Inc. All rights reserved. Operating Kafka clusters are hard.
  • 4. 4 © 2023 Cloudera, Inc. All rights reserved.
  • 5. 5 © 2023 Cloudera, Inc. All rights reserved. 1 Surprises There can be data center outages, maintenance can go wrong and upgrades can go wrong.
  • 6. 6 © 2023 Cloudera, Inc. All rights reserved. 1 Surprises There can be data center outages, maintenance can go wrong and upgrades can go wrong. 2 Scaling New brokers won’t get populated. What to put on them? Where to put partitions when decommissioning brokers?
  • 7. 7 © 2023 Cloudera, Inc. All rights reserved. 1 Surprises There can be data center outages, maintenance can go wrong and upgrades can go wrong. 3 Performance Brokers can be overloaded or even underloaded. Rebalances can ruin your cluster. 2 Scaling New brokers won’t get populated. What to put on them? Where to put partitions when decommissioning brokers?
  • 8. 8 © 2023 Cloudera, Inc. All rights reserved. How do we solve these problems?
  • 9. 9 © 2023 Cloudera, Inc. All rights reserved.
  • 10. 10 © 2023 Cloudera, Inc. All rights reserved. SOME HIGHLIGHTS
  • 11. 11 © 2023 Cloudera, Inc. All rights reserved. SOME HIGHLIGHTS SELF-HEALING By rebalances Detect—failures and anomalies are detected automatically Rebalance—healing will be done via rebalancing partitions to healthy brokers
  • 12. 12 © 2023 Cloudera, Inc. All rights reserved. SOME HIGHLIGHTS SELF-HEALING By rebalances ADMIN Scaling and maintenance Detect—failures and anomalies are detected automatically Rebalance—healing will be done via rebalancing partitions to healthy brokers Upscaling—brokers can be added to Cruise Control and populated with partitions Downscaling—brokers can be removed from the cluster and partitions will be reassigned optimally
  • 13. 13 © 2023 Cloudera, Inc. All rights reserved. SOME HIGHLIGHTS SELF-HEALING By rebalances ADMIN Scaling and maintenance MONITORING Even load everywhere Detect—failures and anomalies are detected automatically Rebalance—healing will be done via rebalancing partitions to healthy brokers Upscaling—brokers can be added to Cruise Control and populated with partitions Downscaling—brokers can be removed from the cluster and partitions will be reassigned optimally Metrics—are collected periodically to ensure an up-to-date view of the cluster Estimation—partitions level utilization is estimated with a linear model
  • 14. 14 © 2023 Cloudera, Inc. All rights reserved. How does it work?
  • 15. 15 © 2023 Cloudera, Inc. All rights reserved. ARCHITECTURE
  • 16. 16 © 2023 Cloudera, Inc. All rights reserved. ARCHITECTURE
  • 17. 17 © 2023 Cloudera, Inc. All rights reserved. ARCHITECTURE
  • 18. 18 © 2023 Cloudera, Inc. All rights reserved. ARCHITECTURE
  • 19. 19 © 2023 Cloudera, Inc. All rights reserved. ARCHITECTURE
  • 20. 20 © 2023 Cloudera, Inc. All rights reserved. ARCHITECTURE
  • 21. 21 © 2023 Cloudera, Inc. All rights reserved. THE LOAD MONITOR
  • 22. 22 © 2023 Cloudera, Inc. All rights reserved. THE LOAD MONITOR
  • 23. 23 © 2023 Cloudera, Inc. All rights reserved. THE LOAD MONITOR
  • 24. 24 © 2023 Cloudera, Inc. All rights reserved. THE LOAD MONITOR
  • 25. 25 © 2023 Cloudera, Inc. All rights reserved. THE LOAD MONITOR
  • 26. 26 © 2023 Cloudera, Inc. All rights reserved. THE LOAD MONITOR
  • 27. 27 © 2023 Cloudera, Inc. All rights reserved. THE ANALYZER The brain
  • 28. 28 © 2023 Cloudera, Inc. All rights reserved. THE ANALYZER The brain Goals Define the characteristics of an optimal aspect
  • 29. 29 © 2023 Cloudera, Inc. All rights reserved. THE ANALYZER The brain Goals Define the characteristics of an optimal aspect Heuristic model An iterative model defines how it reaches the optimal load
  • 30. 30 © 2023 Cloudera, Inc. All rights reserved. THE ANALYZER The brain For each goal g in the goal list ordered by priority { For each broker b { while b does not meet g’s requirement { For each replica r on b sorted by the resource utilization density { Move r (or the leadership of r) to another eligible broker b’ so b’ still satisfies g and all the satisfied goals Finish the optimization for b once g is satisfied. } Fail the optimization if g is a hard goal and is not satisfied for b } } Add g to the satisfied goals } Goals Define the characteristics of an optimal aspect Heuristic model An iterative model defines how it reaches the optimal load
  • 31. 31 © 2023 Cloudera, Inc. All rights reserved. THE ANOMALY DETECTOR
  • 32. 32 © 2023 Cloudera, Inc. All rights reserved. THE ANOMALY DETECTOR Broker failure A non-empty broker crashes. Results in under-replication.
  • 33. 33 © 2023 Cloudera, Inc. All rights reserved. THE ANOMALY DETECTOR Disk failure A non-empty disk dies. Some replicas might get lost. Broker failure A non-empty broker crashes. Results in under-replication.
  • 34. 34 © 2023 Cloudera, Inc. All rights reserved. THE ANOMALY DETECTOR Goal violation An optimization goal is violated. Unbalanced cluster. Disk failure A non-empty disk dies. Some replicas might get lost. Broker failure A non-empty broker crashes. Results in under-replication.
  • 35. 35 © 2023 Cloudera, Inc. All rights reserved. THE ANOMALY DETECTOR Goal violation An optimization goal is violated. Unbalanced cluster. Disk failure A non-empty disk dies. Some replicas might get lost. Broker failure A non-empty broker crashes. Results in under-replication. Fixable
  • 36. 36 © 2023 Cloudera, Inc. All rights reserved. THE ANOMALY DETECTOR Goal violation An optimization goal is violated. Unbalanced cluster. Disk failure A non-empty disk dies. Some replicas might get lost. Metric anomaly One of the collected metric observes an unexpected value. Broker failure A non-empty broker crashes. Results in under-replication. Fixable
  • 37. 37 © 2023 Cloudera, Inc. All rights reserved. THE ANOMALY DETECTOR Goal violation An optimization goal is violated. Unbalanced cluster. Disk failure A non-empty disk dies. Some replicas might get lost. Metric anomaly One of the collected metric observes an unexpected value. Topic anomaly One or more topics violate user-defined properties. Broker failure A non-empty broker crashes. Results in under-replication. Fixable
  • 38. 38 © 2023 Cloudera, Inc. All rights reserved. THE ANOMALY DETECTOR Goal violation An optimization goal is violated. Unbalanced cluster. Disk failure A non-empty disk dies. Some replicas might get lost. Metric anomaly One of the collected metric observes an unexpected value. Topic anomaly One or more topics violate user-defined properties. Broker failure A non-empty broker crashes. Results in under-replication. Fixable Not fixable
  • 39. 39 © 2023 Cloudera, Inc. All rights reserved. EXECUTOR
  • 40. 40 © 2023 Cloudera, Inc. All rights reserved. EXECUTOR Long running Depending on the size of the executed task.
  • 41. 41 © 2023 Cloudera, Inc. All rights reserved. EXECUTOR Interruptible At any point it can be stopped safely. No “undo” functionality. Long running Depending on the size of the executed task.
  • 42. 42 © 2023 Cloudera, Inc. All rights reserved. EXECUTOR Reassignment Optimization is done by reassigning partitions between disks, brokers and adjusting leadership Interruptible At any point it can be stopped safely. No “undo” functionality. Long running Depending on the size of the executed task.
  • 43. 43 © 2023 Cloudera, Inc. All rights reserved. EXECUTOR Reassignment Optimization is done by reassigning partitions between disks, brokers and adjusting leadership Concurrent Reassignment is concurrent, limits are set either manually or by the concurrency adjuster Interruptible At any point it can be stopped safely. No “undo” functionality. Long running Depending on the size of the executed task.
  • 44. 44 © 2023 Cloudera, Inc. All rights reserved. So how do we use it?
  • 45.
  • 46. 46 © 2023 Cloudera, Inc. All rights reserved. What else can it do?
  • 47. 47 © 2023 Cloudera, Inc. All rights reserved. SLOW BROKER FINDER
  • 48. 48 © 2023 Cloudera, Inc. All rights reserved. SLOW BROKER FINDER DETECTION With metrics Log flush—broker log flush 99.9th percentile is used, both absolute and relative to incoming traffic Scoring—brokers have score, every anomaly increases it and every normal behavior decreases it
  • 49. 49 © 2023 Cloudera, Inc. All rights reserved. SLOW BROKER FINDER DETECTION With metrics REMEDIATION Demoting and rebalance Log flush—broker log flush 99.9th percentile is used, both absolute and relative to incoming traffic Scoring—brokers have score, every anomaly increases it and every normal behavior decreases it Demoting—brokers will be demoted as a first attempt Removal—slow brokers can be removed from the cluster if the problem persists after demotion
  • 50. 50 © 2023 Cloudera, Inc. All rights reserved. PROVISIONER
  • 51. 51 © 2023 Cloudera, Inc. All rights reserved. PROVISIONER RIGHTSIZING With API Underprovisioned—the cluster lacks certain kind of resources Overprovisioned—by removing resources and rebalancing it can achieve some cost saving
  • 52. 52 © 2023 Cloudera, Inc. All rights reserved. PROVISIONER RIGHTSIZING With API Underprovisioned—the cluster lacks certain kind of resources Overprovisioned—by removing resources and rebalancing it can achieve some cost saving RECOMMENDATION With goals Resource—a goal can request a resource, like broker, disk to be added or removed Constraints—every goal can give constraints, e.g. racks for which brokers should not be added
  • 53. 53 © 2023 Cloudera, Inc. All rights reserved. PROVISIONER IMPLEMENTATION Provider dependent API—there is an API that can be used as a basis for your implementation Providers—for all providers like AWS or Azure you need to add glue code to acquire or release resources RIGHTSIZING With API Underprovisioned—the cluster lacks certain kind of resources Overprovisioned—by removing resources and rebalancing it can achieve some cost saving RECOMMENDATION With goals Resource—a goal can request a resource, like broker, disk to be added or removed Constraints—every goal can give constraints, e.g. racks for which brokers should not be added
  • 54. 54 © 2023 Cloudera, Inc. All rights reserved. CONCURRENCY ADJUSTER Inter-broker replica reassignment limits aren’t easy
  • 55. 55 © 2023 Cloudera, Inc. All rights reserved. CONCURRENCY ADJUSTER MANUAL LIMITS Challenging Destructive—if the manually set limit is too high, then it can be fast but also overwhelm the cluster Slow—If the limit is too small then rebalances can last for a long time Inter-broker replica reassignment limits aren’t easy
  • 56. 56 © 2023 Cloudera, Inc. All rights reserved. CONCURRENCY ADJUSTER MANUAL LIMITS Challenging Destructive—if the manually set limit is too high, then it can be fast but also overwhelm the cluster Slow—If the limit is too small then rebalances can last for a long time FEEDBACK LOOP Adaptive limit Metrics—metrics are collected to assess whether rebalance concurrency needs adjustments Limits—if metrics are over a threshold then concurrency will adjust with AIMD algorithm Inter-broker replica reassignment limits aren’t easy
  • 57. 57 © 2023 Cloudera, Inc. All rights reserved. CONCURRENCY ADJUSTER RESULTS AIMD Automated—no need for constant oversight for setting concurrency in a fluctuating traffic environment Fast—rebalances will complete faster due to the optimal limit Stable—clients won’t experience any side effects caused by rebalances MANUAL LIMITS Challenging Destructive—if the manually set limit is too high, then it can be fast but also overwhelm the cluster Slow—If the limit is too small then rebalances can last for a long time FEEDBACK LOOP Adaptive limit Metrics—metrics are collected to assess whether rebalance concurrency needs adjustments Limits—if metrics are over a threshold then concurrency will adjust with AIMD algorithm Inter-broker replica reassignment limits aren’t easy
  • 58. 58 © 2023 Cloudera, Inc. All rights reserved. Security matters
  • 59. 59 © 2023 Cloudera, Inc. All rights reserved. SECURITY IN CRUISE CONTROL
  • 60. 60 © 2023 Cloudera, Inc. All rights reserved. SECURITY IN CRUISE CONTROL Basic Auth The simple basic authentication that sends username and password
  • 61. 61 © 2023 Cloudera, Inc. All rights reserved. SECURITY IN CRUISE CONTROL SPNEGO Kerberos over HTTP. Tokens are negotiated via the SPNEGO protocol. Basic Auth The simple basic authentication that sends username and password
  • 62. 62 © 2023 Cloudera, Inc. All rights reserved. SECURITY IN CRUISE CONTROL SPNEGO Kerberos over HTTP. Tokens are negotiated via the SPNEGO protocol. Basic Auth The simple basic authentication that sends username and password Trusted Proxy If a process acts on behalf of a user, the doAs header will forward the username.
  • 63. 63 © 2023 Cloudera, Inc. All rights reserved. SECURITY IN CRUISE CONTROL SPNEGO Kerberos over HTTP. Tokens are negotiated via the SPNEGO protocol. Basic Auth The simple basic authentication that sends username and password Trusted Proxy If a process acts on behalf of a user, the doAs header will forward the username. TLS/HTTPS Communication is secured via HTTPS to the server and TLS to Zookeeper or Kafka
  • 64. 64 © 2023 Cloudera, Inc. All rights reserved. SECURITY IN CRUISE CONTROL SPNEGO Kerberos over HTTP. Tokens are negotiated via the SPNEGO protocol. Basic Auth The simple basic authentication that sends username and password Trusted Proxy If a process acts on behalf of a user, the doAs query param will forward the username. TLS/HTTPS Communication is secured via HTTPS to the server and TLS to Zookeeper or Kafka Authorization Simple model with viewer, user and admin roles.