SlideShare a Scribd company logo
1 of 26
Download to read offline
Raft After ScyllaDB 5.2:
Safe Topology Changes
Konstantin Osipov, Director, Software Engineering
Konstantin Osipov
■ Worked on lightweight transactions in ScyllaDB
■ Crazy about distributed systems testing
■ Muscovite and a father of two
Your photo
goes here,
smile :)
■ ScyllaDB 5.2
■ Topology on Raft
■ Tablet outlook
Presentation Agenda
Previous Episodes
Problem Overview
Strong vs Eventual Consistency
Strong consistency
Node 1 Node 2
1. Write from
client
4. Acknowledged
to client
2. Write propagated
through cluster
3.Internal
acknowledgement
Eventual consistency
Node 1 Node 2
1. Write from
client
2. Acknowledged
to client
3. Eventual write
propagation
● requires a live majority
● always returns latest write
● highly available
● writes must commute
Data vs metadata
- metadata - data
Schema information: table,
view, type definitions
Topology information:
nodes, tokens
Static and dynamic rows,
counters
Replicated everywhere Partitioned
Not commutative Commutative
Changes rarely Changes frequently
Consistency of Metadata
1
2 3
3
1 2
replication_factor=2
ScyllaDB cluster
Raft for Metadata Replication
Consensus
module
State
machine
Log
x←1 y←2 z←3
Consensus
module
State
machine
Log
x←1 y←2 z←3
Consensus
module
State
machine
Log
x←1 y←2 z←3
Node A Node B Node C
ScyllaDB 5.2: Raft GA
5.2 Delivers:
■ --consistent-cluster-management
■ Safe schema changes
■ No data loss
■ Fast schema propagation
■ recovery after a loss of majority
■ IP address change support
Enabling Raft
■ 5.2: ON for new clusters
■ 5.3: ON by default
■ In future versions: mandatory
# Use Raft to consistently manage schema information in the cluster.
# Refer to https://docs.scylladb.com/master/architecture/raft.html for
# more details.
consistent_cluster_management: true
Recovery After a Lost Majority
■ isolate dead members
■ set the recovery key in system.scylla_local
■ restart each node
■ truncate Raft state, cear the recovery key
■ restart to assemble new cluster
Changing IPs
Region 1
Availability zone 1 Availability zone 2 Availability zone 3
100.1.1.1 100.1.153.7
100.1.1.3 100.1.153.4
100.2.1.1 100.2.1.2
100.2.1.3 100.2.1.9
100.3.1.1 100.3.1.2
100.3.1.13 100.3.1.4
Changing IPs
■ Restart to change IPs
■ If all IPs change, update the seeds
Key Tenets of Raft
Based Topology
■ Concurrent topology operations may corrupt your cluster
■ Even when bootstrapping a fresh cluster
■ A failure during topology change may reduce consistency
■ Topology operations take time
■ Even if streaming is quick
Limitations We Aim to Address
■ Raft group includes all cluster members
■ Token metadata is replicated using Raft
■ No stale topology
Moving Topology Data to Raft
■ Runs alongside Raft leader
■ Highly available
■ Drives the progress
■ Performs linearizable reads and writes of token metadata
■ Request coordinators still use the local view on topology
■ No extra coordination when executing user requests
The Centralized Coordinator
node A
bootstrap
bootstrap
Linearizable Token Metadata
node B
node C
system.token_metadata
Read
barrier
Read
barrier
Automatic Coordinator Failover
■ A key change in read/write path is fencing
■ Each write is signed with topology version
■ If there is a version mismatch, the write doesn’t go through
Change in the Data Plane
Replica
Coordinator
Topology
coordinator
R
e
q
(
V
1
)
Req(V 1
)
F
e
n
c
e
(
V
1
)
D
r
a
i
n
(
V
1
)
Req(V 2
)
■ Requesting multiple operations concurrently is safe
■ Sanity checks for operator error
■ Failed operations are aborted automatically
■ Faster topology changes
The Takeaways
The Journey Continues
■ system.broadcast_kv_store
■ --experimental in 5.2
UPDATE system.broadcast_kv_store
SET value = {new_value}
WHERE key = {key} [IF value = {value_condition}];
SELECT value
WHERE key = {key}
FROM system.broadcast_kv_store;
Strongly Consistent Tables
ScyllaDB Journey to Tablets
Raft
Safe
schema
changes
Safe
topology
changes
Dynamic
partitioning
Consistent
tables
Tablets
5.0
5.2
5.3
Thank You
Stay in Touch
Konstantin Osipov
kostja@scylladb.com
kostja_osipov
kostja
kostja

More Related Content

Similar to Raft After ScyllaDB 5.2: Safe Topology Changes

MySQL Multi Master Replication
MySQL Multi Master ReplicationMySQL Multi Master Replication
MySQL Multi Master ReplicationMoshe Kaplan
 
MySQL Multi-Master Replication
MySQL Multi-Master ReplicationMySQL Multi-Master Replication
MySQL Multi-Master ReplicationMichael Naumov
 
Spil Games @ FOSDEM: Galera Replicator IRL
Spil Games @ FOSDEM: Galera Replicator IRLSpil Games @ FOSDEM: Galera Replicator IRL
Spil Games @ FOSDEM: Galera Replicator IRLspil-engineering
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsArinto Murdopo
 
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...Paris Carbone
 
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...DataStax Academy
 
Migrating to XtraDB Cluster
Migrating to XtraDB ClusterMigrating to XtraDB Cluster
Migrating to XtraDB Clusterpercona2013
 
Graph processing
Graph processingGraph processing
Graph processingyeahjs
 
More Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit JunoMore Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit JunoKota Tsuyuzaki
 
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreLec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreHsien-Hsin Sean Lee, Ph.D.
 
ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?Jagadish Venkatraman
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Lucidworks
 
Анализ телеметрии при масштабировании, Theo Schlossnagle (Circonus)
Анализ телеметрии при масштабировании, Theo Schlossnagle (Circonus)Анализ телеметрии при масштабировании, Theo Schlossnagle (Circonus)
Анализ телеметрии при масштабировании, Theo Schlossnagle (Circonus)Ontico
 
Migrating to XtraDB Cluster
Migrating to XtraDB ClusterMigrating to XtraDB Cluster
Migrating to XtraDB Clusterpercona2013
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward
 
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward
 
Disaggregated Networking - The Drivers, the Software & The High Availability
Disaggregated Networking - The Drivers, the Software & The High AvailabilityDisaggregated Networking - The Drivers, the Software & The High Availability
Disaggregated Networking - The Drivers, the Software & The High AvailabilityOpen Networking Summit
 
"Stateful app as an efficient way to build dispatching for riders and drivers...
"Stateful app as an efficient way to build dispatching for riders and drivers..."Stateful app as an efficient way to build dispatching for riders and drivers...
"Stateful app as an efficient way to build dispatching for riders and drivers...Fwdays
 

Similar to Raft After ScyllaDB 5.2: Safe Topology Changes (20)

MySQL Multi Master Replication
MySQL Multi Master ReplicationMySQL Multi Master Replication
MySQL Multi Master Replication
 
MySQL Multi-Master Replication
MySQL Multi-Master ReplicationMySQL Multi-Master Replication
MySQL Multi-Master Replication
 
Spil Games @ FOSDEM: Galera Replicator IRL
Spil Games @ FOSDEM: Galera Replicator IRLSpil Games @ FOSDEM: Galera Replicator IRL
Spil Games @ FOSDEM: Galera Replicator IRL
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data Streams
 
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
 
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
 
Migrating to XtraDB Cluster
Migrating to XtraDB ClusterMigrating to XtraDB Cluster
Migrating to XtraDB Cluster
 
Graph processing
Graph processingGraph processing
Graph processing
 
More Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit JunoMore Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit Juno
 
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreLec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
 
ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
 
Анализ телеметрии при масштабировании, Theo Schlossnagle (Circonus)
Анализ телеметрии при масштабировании, Theo Schlossnagle (Circonus)Анализ телеметрии при масштабировании, Theo Schlossnagle (Circonus)
Анализ телеметрии при масштабировании, Theo Schlossnagle (Circonus)
 
Migrating to XtraDB Cluster
Migrating to XtraDB ClusterMigrating to XtraDB Cluster
Migrating to XtraDB Cluster
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
 
Galera webinar migration to galera cluster from my sql async replication
Galera webinar migration to galera cluster from my sql async replicationGalera webinar migration to galera cluster from my sql async replication
Galera webinar migration to galera cluster from my sql async replication
 
Geode - Day 2
Geode - Day 2Geode - Day 2
Geode - Day 2
 
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
 
Disaggregated Networking - The Drivers, the Software & The High Availability
Disaggregated Networking - The Drivers, the Software & The High AvailabilityDisaggregated Networking - The Drivers, the Software & The High Availability
Disaggregated Networking - The Drivers, the Software & The High Availability
 
"Stateful app as an efficient way to build dispatching for riders and drivers...
"Stateful app as an efficient way to build dispatching for riders and drivers..."Stateful app as an efficient way to build dispatching for riders and drivers...
"Stateful app as an efficient way to build dispatching for riders and drivers...
 

More from ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

More from ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Recently uploaded

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Raft After ScyllaDB 5.2: Safe Topology Changes

  • 1. Raft After ScyllaDB 5.2: Safe Topology Changes Konstantin Osipov, Director, Software Engineering
  • 2. Konstantin Osipov ■ Worked on lightweight transactions in ScyllaDB ■ Crazy about distributed systems testing ■ Muscovite and a father of two Your photo goes here, smile :)
  • 3. ■ ScyllaDB 5.2 ■ Topology on Raft ■ Tablet outlook Presentation Agenda
  • 6. Strong vs Eventual Consistency Strong consistency Node 1 Node 2 1. Write from client 4. Acknowledged to client 2. Write propagated through cluster 3.Internal acknowledgement Eventual consistency Node 1 Node 2 1. Write from client 2. Acknowledged to client 3. Eventual write propagation ● requires a live majority ● always returns latest write ● highly available ● writes must commute
  • 7. Data vs metadata - metadata - data Schema information: table, view, type definitions Topology information: nodes, tokens Static and dynamic rows, counters Replicated everywhere Partitioned Not commutative Commutative Changes rarely Changes frequently Consistency of Metadata 1 2 3 3 1 2 replication_factor=2 ScyllaDB cluster
  • 8. Raft for Metadata Replication Consensus module State machine Log x←1 y←2 z←3 Consensus module State machine Log x←1 y←2 z←3 Consensus module State machine Log x←1 y←2 z←3 Node A Node B Node C
  • 10. 5.2 Delivers: ■ --consistent-cluster-management ■ Safe schema changes ■ No data loss ■ Fast schema propagation ■ recovery after a loss of majority ■ IP address change support
  • 11. Enabling Raft ■ 5.2: ON for new clusters ■ 5.3: ON by default ■ In future versions: mandatory # Use Raft to consistently manage schema information in the cluster. # Refer to https://docs.scylladb.com/master/architecture/raft.html for # more details. consistent_cluster_management: true
  • 12. Recovery After a Lost Majority ■ isolate dead members ■ set the recovery key in system.scylla_local ■ restart each node ■ truncate Raft state, cear the recovery key ■ restart to assemble new cluster
  • 13. Changing IPs Region 1 Availability zone 1 Availability zone 2 Availability zone 3 100.1.1.1 100.1.153.7 100.1.1.3 100.1.153.4 100.2.1.1 100.2.1.2 100.2.1.3 100.2.1.9 100.3.1.1 100.3.1.2 100.3.1.13 100.3.1.4
  • 14. Changing IPs ■ Restart to change IPs ■ If all IPs change, update the seeds
  • 15. Key Tenets of Raft Based Topology
  • 16. ■ Concurrent topology operations may corrupt your cluster ■ Even when bootstrapping a fresh cluster ■ A failure during topology change may reduce consistency ■ Topology operations take time ■ Even if streaming is quick Limitations We Aim to Address
  • 17. ■ Raft group includes all cluster members ■ Token metadata is replicated using Raft ■ No stale topology Moving Topology Data to Raft
  • 18. ■ Runs alongside Raft leader ■ Highly available ■ Drives the progress ■ Performs linearizable reads and writes of token metadata ■ Request coordinators still use the local view on topology ■ No extra coordination when executing user requests The Centralized Coordinator
  • 19. node A bootstrap bootstrap Linearizable Token Metadata node B node C system.token_metadata Read barrier Read barrier
  • 21. ■ A key change in read/write path is fencing ■ Each write is signed with topology version ■ If there is a version mismatch, the write doesn’t go through Change in the Data Plane Replica Coordinator Topology coordinator R e q ( V 1 ) Req(V 1 ) F e n c e ( V 1 ) D r a i n ( V 1 ) Req(V 2 )
  • 22. ■ Requesting multiple operations concurrently is safe ■ Sanity checks for operator error ■ Failed operations are aborted automatically ■ Faster topology changes The Takeaways
  • 24. ■ system.broadcast_kv_store ■ --experimental in 5.2 UPDATE system.broadcast_kv_store SET value = {new_value} WHERE key = {key} [IF value = {value_condition}]; SELECT value WHERE key = {key} FROM system.broadcast_kv_store; Strongly Consistent Tables
  • 25. ScyllaDB Journey to Tablets Raft Safe schema changes Safe topology changes Dynamic partitioning Consistent tables Tablets 5.0 5.2 5.3
  • 26. Thank You Stay in Touch Konstantin Osipov kostja@scylladb.com kostja_osipov kostja kostja