SlideShare une entreprise Scribd logo
1  sur  32
Scylla on Kubernetes:
Introducing the
Scylla Operator
Yannis Zarkadas, Software Engineer @ Arrikto
Presenter
Yannis Zarkadas, Software Engineer
■ Storage, DevOps, ML-Engineering
■ Open Source Enthusiast:
● Scylla Operator
● Cassandra Operator in rook.io
● Kubeflow
Problem Statement
● Great database
● Requires operational
expertise
● Great workload
management platform
Can we leverage Kubernetes to write a great management layer for Scylla ?
Pod
kubelet
Master
Node 1
kubelet
Node 2
kubelet
Node 3
kubelet
Node 4
API
Server
Pod
etcd nginx
Pod
MySQL
Pod
tomcat
Pod
kubectl apply
-f
save
Controllers
Scheduler
write
Various
Controllers
new
Pod
Node 4
new Pod
schedule
StatefulSet
Deploys and scales stateful software.
Provides guarantees for:
■ Pod uniqueness
● At most 1 of each Pod exists at any given time
■ Pod ordering
● Rolling Update and Deployment
■ Persistent network and storage identity
● DNS record and own Persistent Volume
storage
identity
network
identity
spec.replicas: status.replicas:
status.readyReplicas:
StatefulSet Controller
kubelet
Master
Node 1
kubelet
Node 2
kubelet
Node 3
kubelet
Node 4
API
Server
Pod
etcd
kubectl apply
-f
Controllers
StatefulSet
Controller
Various
Controllers
Node 4
write
Headless
Service
StatefulSet
save
scylla-0
Pod
scylla-0.scylla.
default.svc.cluster.local
scylla-1
Pod
scylla-1.scylla.
default.svc.cluster.local
scylla-2
Pod
scylla-2.scylla.
default.svc.cluster.local
3 0
0
1
1
2
2
3
3
Controller
Spec
(desired)
Status
(real)
Kubernetes
Objects
Controller Pattern
Used everywhere in Kubernetes
Observe
Calculate
Reconcile
Physical ResourcesPhysical ResourcesPhysical Resources
write
Custom Resource Definition
■ Store Custom Objects
■ Compatible with kubectl
● kubectl get clusters
The Operator Pattern
Controller
Observe
Calculate
Reconcile
write
Operator = Controller(s) + CRD(s)
Why the StatefulSet
is not enough
StatefulSet: Confined to 1 Rack
Member Pod
Cluster
Rack
Datacenter
StatefulSet
StatefulSet
StatefulSet
Multiple Racks ?
Multiple Datacenters?
Pod
Member
Safe Scale Down 0
44
88
132
176
220
● Want to leave
○ nodetool decommission
● Stream data
● Leave
Scylla Ring
member-0 Up
member-1 Up
member-2 Up
member-3 Up
member-4 Up
member-5 UpLeaving
Member
Member
Member
Member Member
Member
StatefulSet: Unsafe Scale Down
kubelet
Master
Node 1
kubelet
Node 2
kubelet
Node 3
kubelet
Node 4
API
Server
Pod
etcd
Controllers
StatefulSet
Controller
Various
Controllers
Node 4
scylla-1
Pod
scylla-1.scylla.
default.svc.cluster.local
spec.replicas: 2
scylla-2
Pod
scylla-2.scylla.
default.svc.cluster.local
StatefulSet
Scale Down?
spec.replicas: status.replicas:
status.readyReplicas:
3 0
0
1
1
2
2
3
3
kubectl apply
-f
save
2
Data not streamed!
Scylla Ring
scylla-0 Up
scylla-1 Up
scylla-2 UpDown
Potential Data Loss!
scylla-0
Pod
scylla-0.scylla.
default.svc.cluster.local
StatefulSet: Cannot track Member identity
kubelet
Master
Node 1
kubelet
Node 2
kubelet
Node 3
kubelet
Node 4
API
Server
Pod
etcd
Controllers
StatefulSet
Controller
Various
Controllers
Node 4
scylla-0
Pod
scylla-0.scylla.
default.svc.cluster.local
scylla-2
Pod
scylla-2.scylla.
default.svc.cluster.local
scylla-1
Pod
scylla-1.scylla.
default.svc.cluster.local
Member Joining
Replace Member? Add new Member?
Node Fail
Must know Member identity beforehand!
Vanilla Solution: StatefulSet
Problems with:
■ Seeds
■ Multi-zone deployment
■ Scale Down
■ Loss of Persistence
■ Backups/Restores
■ Extensibility
What if we could create management software in
the image of Kubernetes Controllers?
Design
Our goal
Operator = Controller(s) + CRD(s)
Controller
Observe
Calculate
Reconcile
write
StatefulSet
Pod
Rack N, Datacenter M
...
Cluster
Custom
Resource
Member
Services
(Static IP)
Controller
communication through Labels / Annotations
Member
Services
(Static IP)
Member
Services
(Static IP)
write
watch
Sidecar
JMX/HTTP
StatefulSet
Pod
Rack 1, Datacenter 1
Sidecar
JMX/HTTP
StatefulSet
Pod
Rack 1, Datacenter 2
Sidecar
JMX/HTTP
Mapping of Abstractions
Member Pod
Cluster
Rack
Datacenter
StatefulSet
StatefulSets
Cluster
Custom Resource
Sidecar
CRD + Controller + Sidecar
Sidecar
JMX/HTTP
Pod
Sidecar needed to:
■ Setup config files
■ Install plugins at startup
■ Backup and Restore functionality
■ Future extensibility
Member
An Alternative to DNS Records
Services already have a static IP, called ClusterIP.
Solution: ClusterIP Service per Pod
Drawbacks? :
■ Performance: iptables can handle a few hundred Members, IPVS
can handle thousands with no problem.
■ ClusterIP CIDR Depletion: Usually a /12 IP Block, so plenty of
addresses.
Much Requested Feature ->
■ What if we could have static IPs?
Implementation
Cluster Creation & Scale Up
kubelet
Master
Node 1
kubelet
Node 2
kubelet
Node 3
kubelet
Node 4
API
Server
Pod
etcd
Controllers
Scylla
Operator
Various
Controllers
eu-west1-b
eu-west1-c
Spec:
eu-west1-b: 1 Members
eu-west1-c: 2 Members
Status:
eu-west1-b: 0 Members 0 ReadyMembers
eu-west1-c: 0 Members 0 ReadyMembers
scylla-eu-west1-b-0
Pod
10.96.0.1
Member
Service
scylla-eu-west1-c-0
Pod
10.96.0.3
Member
Service
scylla-eu-west1-c-1
Pod
10.96.0.4
Member
Service
Scylla
Cluster
write
kubectl
apply
save
new Cluster
1 1
1 12 2
StatefulSet
eu-west1-c
replicas: 0
StatefulSet
eu-west1-b
replicas: 01
12
kubelet
Scale Down
Sidecar
scylla-eu-west1-c-1
Member
Pod
kubelet
Master
Node 1
kubelet
Node 3
Node 4
API
Server
Pod
etcd
Controllers
Scylla
Operator
Various
Controllers
eu-west1-b
eu-west1-c
Spec:
eu-west1-b: 1 Members
eu-west1-c: 2 Members
Status:
eu-west1-b: 0 Members 0 ReadyMembers
eu-west1-c: 0 Members 0 ReadyMembers
scylla-eu-west1-b-0
Pod
10.96.0.1
Member
Service
scylla-eu-west1-c-0
Pod
10.96.0.3
Member
Service
Scylla
Cluster
kubectl
apply
save
scale down eu-west1-c
Cluster changed
10.96.0.4
1 1
1 12 2
StatefulSet
eu-west1-c
replicas: 0
StatefulSet
eu-west1-b
replicas: 01
12
1
Member
Service
decommissioned: false
nodetool decommission
Node 4
Scylla Ring
scylla-eu-west1-b-0 Up
scylla-eu-west1-c-0 Up
scylla-eu-west1-c-1 UpLeaving
decommissioned: true
stream
data
kubelet
Node 2
Local Storage vs Network Attached
Local NVME
SSD
Network Attached Storage
(AWS EBS, Google Persistent
Disk)
■ Fast
■ Ephemeral
■ Slow
■ Fault-tolerant
Scylla handles replication => Use Local Storage!
v1.10: Local Persistent Volumes in Beta
Local Storage Failure Scenarios
■ Disk Misbehaves
● Block errors
● Deteriorating performance
■ Disk Fails
● Mount Point Disappears
■ Node Fails
● With Disk on it
■ Pod still runs
■ Unhandled by K8s
■ Pod fails to start
■ Unhandled by K8s
■ Pod fails to be scheduled
■ Unhandled by K8s
Common in the Cloud!
Node Fail
kubelet
Master
Node 1
kubelet
Node 2
kubelet
Node 4
API
Server
Pod
etcd
Controllers
Scylla
Operator
Various
Controllers
/mnt/ssd1 /mnt/ssd1
/mnt/ssd1
member-0
Pod
10.96.0.1
Member
Service
kubelet
Node 3
/mnt/ssd1
member-1
Pod
10.96.0.3
Member
Service
member-2
Pod
10.96.0.4
Member
Service
Node Fail
Admin / Fencing Software
Delete Node 3
StatefulSet changed
Recreate PVC
member-1
Pod
10.96.0.3
Member
Service
Empty Disk
kubelet
Node 2
/mnt/ssd1
member-1
Pod
10.96.0.3
Member
Service
Algorithm:
Cluster Member?
(search with IP)
Yes
Empty Disk ?
Stream Missing Data
(replace_address_first_boot option)
Yes
Node Fail Empty Disk
Demo
Take away
Kubernetes helps to manage Scylla, but has some limitations:
■ CPU Pinning
● Huge performance gains.
● Must be enabled in the kubelet.
● Many managed solutions don’t enable it.
■ Local Storage
● Supported but still needs improvement.
● Some vendors don’t offer high storage machines for K8s.
■ Multi-Region Clusters
● Still an unsolved problem.
“Cost of Containerization” by Moreno Garcia:
https://www.scylladb.com/2018/08/09/cost-containerization-scylla/
Future Work
Scylla Operator
■ Repairs with Scylla Manager
■ Multi-Region Clusters
● Very early support in Kubernetes
● LoadBalancer per Pod is a possible workaround
■ Backups and Restores
■ File your own issue:
● https://github.com/scylladb/scylla-operator
Kubernetes
■ Better Support for Local Storage
● Monitoring, scheduling
Thank you Stay in touch
Any questions?
Yannis Zarkadas
yanniszark@arrikto.com
@yanniszark

Contenu connexe

Tendances

What's new in Oracle 19c & 18c Recovery Manager (RMAN)
What's new in Oracle 19c & 18c Recovery Manager (RMAN)What's new in Oracle 19c & 18c Recovery Manager (RMAN)
What's new in Oracle 19c & 18c Recovery Manager (RMAN)
Satishbabu Gunukula
 

Tendances (20)

Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
What's new in Oracle 19c & 18c Recovery Manager (RMAN)
What's new in Oracle 19c & 18c Recovery Manager (RMAN)What's new in Oracle 19c & 18c Recovery Manager (RMAN)
What's new in Oracle 19c & 18c Recovery Manager (RMAN)
 
How to Survive an OpenStack Cloud Meltdown with Ceph
How to Survive an OpenStack Cloud Meltdown with CephHow to Survive an OpenStack Cloud Meltdown with Ceph
How to Survive an OpenStack Cloud Meltdown with Ceph
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
Scylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with RaftScylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with Raft
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorApache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®
 
Lessons learned from writing over 300,000 lines of infrastructure code
Lessons learned from writing over 300,000 lines of infrastructure codeLessons learned from writing over 300,000 lines of infrastructure code
Lessons learned from writing over 300,000 lines of infrastructure code
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
 
My first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfMy first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdf
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Apache Camel v3, Camel K and Camel Quarkus
Apache Camel v3, Camel K and Camel QuarkusApache Camel v3, Camel K and Camel Quarkus
Apache Camel v3, Camel K and Camel Quarkus
 
Apache Airflow in Production
Apache Airflow in ProductionApache Airflow in Production
Apache Airflow in Production
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 

Similaire à Scylla on Kubernetes: Introducing the Scylla Operator

Similaire à Scylla on Kubernetes: Introducing the Scylla Operator (20)

Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
LINE's Private Cloud - Meet Cloud Native World
LINE's Private Cloud - Meet Cloud Native WorldLINE's Private Cloud - Meet Cloud Native World
LINE's Private Cloud - Meet Cloud Native World
 
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
 
[WSO2Con Asia 2018] Deploying Applications in K8S and Docker
[WSO2Con Asia 2018] Deploying Applications in K8S and Docker[WSO2Con Asia 2018] Deploying Applications in K8S and Docker
[WSO2Con Asia 2018] Deploying Applications in K8S and Docker
 
Data weekender deploying prod grade sql 2019 big data clusters
Data weekender deploying prod grade sql 2019 big data clustersData weekender deploying prod grade sql 2019 big data clusters
Data weekender deploying prod grade sql 2019 big data clusters
 
Best practices for optimizing Red Hat platforms for large scale datacenter de...
Best practices for optimizing Red Hat platforms for large scale datacenter de...Best practices for optimizing Red Hat platforms for large scale datacenter de...
Best practices for optimizing Red Hat platforms for large scale datacenter de...
 
State of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigDataState of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigData
 
Dayta AI Seminar - Kubernetes, Docker and AI on Cloud
Dayta AI Seminar - Kubernetes, Docker and AI on CloudDayta AI Seminar - Kubernetes, Docker and AI on Cloud
Dayta AI Seminar - Kubernetes, Docker and AI on Cloud
 
Server 2016 sneak peek
Server 2016 sneak peekServer 2016 sneak peek
Server 2016 sneak peek
 
[WSO2Con EU 2018] Deploying Applications in K8S and Docker
[WSO2Con EU 2018] Deploying Applications in K8S and Docker[WSO2Con EU 2018] Deploying Applications in K8S and Docker
[WSO2Con EU 2018] Deploying Applications in K8S and Docker
 
Running a database on local NVMes on Kubernetes
Running a database on local NVMes on KubernetesRunning a database on local NVMes on Kubernetes
Running a database on local NVMes on Kubernetes
 
Running a database on local NVMes on Kubernetes
Running a database on local NVMes on KubernetesRunning a database on local NVMes on Kubernetes
Running a database on local NVMes on Kubernetes
 
PGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander Kukushkin
PGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander KukushkinPGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander Kukushkin
PGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander Kukushkin
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Apache Spark on K8s and HDFS Security
Apache Spark on K8s and HDFS SecurityApache Spark on K8s and HDFS Security
Apache Spark on K8s and HDFS Security
 
Xen Virtualization 2008
Xen Virtualization 2008Xen Virtualization 2008
Xen Virtualization 2008
 
OpenEBS hangout #4
OpenEBS hangout #4OpenEBS hangout #4
OpenEBS hangout #4
 
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
OpenStack Cinder, Implementation Today and New Trends for TomorrowOpenStack Cinder, Implementation Today and New Trends for Tomorrow
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
 
OpenSlava Infrastructure Automation Patterns
OpenSlava   Infrastructure Automation PatternsOpenSlava   Infrastructure Automation Patterns
OpenSlava Infrastructure Automation Patterns
 
Redis Meetup TLV - K8s Session 28/10/2018
Redis Meetup TLV - K8s Session 28/10/2018Redis Meetup TLV - K8s Session 28/10/2018
Redis Meetup TLV - K8s Session 28/10/2018
 

Plus de ScyllaDB

Plus de ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Dernier (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

Scylla on Kubernetes: Introducing the Scylla Operator

  • 1. Scylla on Kubernetes: Introducing the Scylla Operator Yannis Zarkadas, Software Engineer @ Arrikto
  • 2. Presenter Yannis Zarkadas, Software Engineer ■ Storage, DevOps, ML-Engineering ■ Open Source Enthusiast: ● Scylla Operator ● Cassandra Operator in rook.io ● Kubeflow
  • 3. Problem Statement ● Great database ● Requires operational expertise ● Great workload management platform Can we leverage Kubernetes to write a great management layer for Scylla ?
  • 4. Pod kubelet Master Node 1 kubelet Node 2 kubelet Node 3 kubelet Node 4 API Server Pod etcd nginx Pod MySQL Pod tomcat Pod kubectl apply -f save Controllers Scheduler write Various Controllers new Pod Node 4 new Pod schedule
  • 5. StatefulSet Deploys and scales stateful software. Provides guarantees for: ■ Pod uniqueness ● At most 1 of each Pod exists at any given time ■ Pod ordering ● Rolling Update and Deployment ■ Persistent network and storage identity ● DNS record and own Persistent Volume storage identity network identity
  • 6. spec.replicas: status.replicas: status.readyReplicas: StatefulSet Controller kubelet Master Node 1 kubelet Node 2 kubelet Node 3 kubelet Node 4 API Server Pod etcd kubectl apply -f Controllers StatefulSet Controller Various Controllers Node 4 write Headless Service StatefulSet save scylla-0 Pod scylla-0.scylla. default.svc.cluster.local scylla-1 Pod scylla-1.scylla. default.svc.cluster.local scylla-2 Pod scylla-2.scylla. default.svc.cluster.local 3 0 0 1 1 2 2 3 3
  • 7. Controller Spec (desired) Status (real) Kubernetes Objects Controller Pattern Used everywhere in Kubernetes Observe Calculate Reconcile Physical ResourcesPhysical ResourcesPhysical Resources write
  • 8. Custom Resource Definition ■ Store Custom Objects ■ Compatible with kubectl ● kubectl get clusters
  • 11. StatefulSet: Confined to 1 Rack Member Pod Cluster Rack Datacenter StatefulSet StatefulSet StatefulSet Multiple Racks ? Multiple Datacenters? Pod Member
  • 12. Safe Scale Down 0 44 88 132 176 220 ● Want to leave ○ nodetool decommission ● Stream data ● Leave Scylla Ring member-0 Up member-1 Up member-2 Up member-3 Up member-4 Up member-5 UpLeaving Member Member Member Member Member Member
  • 13. StatefulSet: Unsafe Scale Down kubelet Master Node 1 kubelet Node 2 kubelet Node 3 kubelet Node 4 API Server Pod etcd Controllers StatefulSet Controller Various Controllers Node 4 scylla-1 Pod scylla-1.scylla. default.svc.cluster.local spec.replicas: 2 scylla-2 Pod scylla-2.scylla. default.svc.cluster.local StatefulSet Scale Down? spec.replicas: status.replicas: status.readyReplicas: 3 0 0 1 1 2 2 3 3 kubectl apply -f save 2 Data not streamed! Scylla Ring scylla-0 Up scylla-1 Up scylla-2 UpDown Potential Data Loss! scylla-0 Pod scylla-0.scylla. default.svc.cluster.local
  • 14. StatefulSet: Cannot track Member identity kubelet Master Node 1 kubelet Node 2 kubelet Node 3 kubelet Node 4 API Server Pod etcd Controllers StatefulSet Controller Various Controllers Node 4 scylla-0 Pod scylla-0.scylla. default.svc.cluster.local scylla-2 Pod scylla-2.scylla. default.svc.cluster.local scylla-1 Pod scylla-1.scylla. default.svc.cluster.local Member Joining Replace Member? Add new Member? Node Fail Must know Member identity beforehand!
  • 15. Vanilla Solution: StatefulSet Problems with: ■ Seeds ■ Multi-zone deployment ■ Scale Down ■ Loss of Persistence ■ Backups/Restores ■ Extensibility What if we could create management software in the image of Kubernetes Controllers?
  • 17. Our goal Operator = Controller(s) + CRD(s) Controller Observe Calculate Reconcile write
  • 18. StatefulSet Pod Rack N, Datacenter M ... Cluster Custom Resource Member Services (Static IP) Controller communication through Labels / Annotations Member Services (Static IP) Member Services (Static IP) write watch Sidecar JMX/HTTP StatefulSet Pod Rack 1, Datacenter 1 Sidecar JMX/HTTP StatefulSet Pod Rack 1, Datacenter 2 Sidecar JMX/HTTP
  • 19. Mapping of Abstractions Member Pod Cluster Rack Datacenter StatefulSet StatefulSets Cluster Custom Resource
  • 20. Sidecar CRD + Controller + Sidecar Sidecar JMX/HTTP Pod Sidecar needed to: ■ Setup config files ■ Install plugins at startup ■ Backup and Restore functionality ■ Future extensibility Member
  • 21. An Alternative to DNS Records Services already have a static IP, called ClusterIP. Solution: ClusterIP Service per Pod Drawbacks? : ■ Performance: iptables can handle a few hundred Members, IPVS can handle thousands with no problem. ■ ClusterIP CIDR Depletion: Usually a /12 IP Block, so plenty of addresses. Much Requested Feature -> ■ What if we could have static IPs?
  • 23. Cluster Creation & Scale Up kubelet Master Node 1 kubelet Node 2 kubelet Node 3 kubelet Node 4 API Server Pod etcd Controllers Scylla Operator Various Controllers eu-west1-b eu-west1-c Spec: eu-west1-b: 1 Members eu-west1-c: 2 Members Status: eu-west1-b: 0 Members 0 ReadyMembers eu-west1-c: 0 Members 0 ReadyMembers scylla-eu-west1-b-0 Pod 10.96.0.1 Member Service scylla-eu-west1-c-0 Pod 10.96.0.3 Member Service scylla-eu-west1-c-1 Pod 10.96.0.4 Member Service Scylla Cluster write kubectl apply save new Cluster 1 1 1 12 2 StatefulSet eu-west1-c replicas: 0 StatefulSet eu-west1-b replicas: 01 12
  • 24. kubelet Scale Down Sidecar scylla-eu-west1-c-1 Member Pod kubelet Master Node 1 kubelet Node 3 Node 4 API Server Pod etcd Controllers Scylla Operator Various Controllers eu-west1-b eu-west1-c Spec: eu-west1-b: 1 Members eu-west1-c: 2 Members Status: eu-west1-b: 0 Members 0 ReadyMembers eu-west1-c: 0 Members 0 ReadyMembers scylla-eu-west1-b-0 Pod 10.96.0.1 Member Service scylla-eu-west1-c-0 Pod 10.96.0.3 Member Service Scylla Cluster kubectl apply save scale down eu-west1-c Cluster changed 10.96.0.4 1 1 1 12 2 StatefulSet eu-west1-c replicas: 0 StatefulSet eu-west1-b replicas: 01 12 1 Member Service decommissioned: false nodetool decommission Node 4 Scylla Ring scylla-eu-west1-b-0 Up scylla-eu-west1-c-0 Up scylla-eu-west1-c-1 UpLeaving decommissioned: true stream data kubelet Node 2
  • 25. Local Storage vs Network Attached Local NVME SSD Network Attached Storage (AWS EBS, Google Persistent Disk) ■ Fast ■ Ephemeral ■ Slow ■ Fault-tolerant Scylla handles replication => Use Local Storage! v1.10: Local Persistent Volumes in Beta
  • 26. Local Storage Failure Scenarios ■ Disk Misbehaves ● Block errors ● Deteriorating performance ■ Disk Fails ● Mount Point Disappears ■ Node Fails ● With Disk on it ■ Pod still runs ■ Unhandled by K8s ■ Pod fails to start ■ Unhandled by K8s ■ Pod fails to be scheduled ■ Unhandled by K8s Common in the Cloud!
  • 27. Node Fail kubelet Master Node 1 kubelet Node 2 kubelet Node 4 API Server Pod etcd Controllers Scylla Operator Various Controllers /mnt/ssd1 /mnt/ssd1 /mnt/ssd1 member-0 Pod 10.96.0.1 Member Service kubelet Node 3 /mnt/ssd1 member-1 Pod 10.96.0.3 Member Service member-2 Pod 10.96.0.4 Member Service Node Fail Admin / Fencing Software Delete Node 3 StatefulSet changed Recreate PVC member-1 Pod 10.96.0.3 Member Service Empty Disk
  • 28. kubelet Node 2 /mnt/ssd1 member-1 Pod 10.96.0.3 Member Service Algorithm: Cluster Member? (search with IP) Yes Empty Disk ? Stream Missing Data (replace_address_first_boot option) Yes Node Fail Empty Disk
  • 29. Demo
  • 30. Take away Kubernetes helps to manage Scylla, but has some limitations: ■ CPU Pinning ● Huge performance gains. ● Must be enabled in the kubelet. ● Many managed solutions don’t enable it. ■ Local Storage ● Supported but still needs improvement. ● Some vendors don’t offer high storage machines for K8s. ■ Multi-Region Clusters ● Still an unsolved problem. “Cost of Containerization” by Moreno Garcia: https://www.scylladb.com/2018/08/09/cost-containerization-scylla/
  • 31. Future Work Scylla Operator ■ Repairs with Scylla Manager ■ Multi-Region Clusters ● Very early support in Kubernetes ● LoadBalancer per Pod is a possible workaround ■ Backups and Restores ■ File your own issue: ● https://github.com/scylladb/scylla-operator Kubernetes ■ Better Support for Local Storage ● Monitoring, scheduling
  • 32. Thank you Stay in touch Any questions? Yannis Zarkadas yanniszark@arrikto.com @yanniszark

Notes de l'éditeur

  1. Overview of distributed nature of Scylla
  2. Overview: each member stores a different portion of the data
  3. Intro to kubernetes: Smallest unit of processing: Pod Declarative nature: user declares desired state, Kubernetes works to satisfy
  4. Kubernetes’ solution for running DBs: StatefulSet
  5. Example of how the StatefulSet works
  6. Controller pattern that appears everywhere in K8s: 1. Observe desired state 2. Calculate actual state 3. Diff and take action
  7. What is missing to enable us to build our own controller? Custom Objects. CRDs enable us to store custom objects in etcd.
  8. Operator pattern. Controller acts as a human operator would.
  9. Examples of how our design addresses each of the StatefulSet’s shortcomings.