Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Raft After ScyllaDB 5.2: Safe Topology Changes

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité

Consultez-les par la suite

1 sur 26 Publicité

Raft After ScyllaDB 5.2: Safe Topology Changes

Télécharger pour lire hors ligne

ScyllaDB’s drive towards strongly consistent features continues, and in this talk I will cover the upcoming implementation of safe topology changes feature: our rethinking of adding and removing nodes to a Scylla cluster.
Quickly assembling a fresh cluster, performing topology and schema changes concurrently, quickly restarting a node with a different IP address or configuration – all of this has become possible thanks to a centralized - yet fault-tolerant - topology change coordinator, the new algorithm we implemented for Scylla 5.3. The next step would be automatically changing data placement to adjust to the load and distribution of data - our future plans which I will touch upon as well.

ScyllaDB’s drive towards strongly consistent features continues, and in this talk I will cover the upcoming implementation of safe topology changes feature: our rethinking of adding and removing nodes to a Scylla cluster.
Quickly assembling a fresh cluster, performing topology and schema changes concurrently, quickly restarting a node with a different IP address or configuration – all of this has become possible thanks to a centralized - yet fault-tolerant - topology change coordinator, the new algorithm we implemented for Scylla 5.3. The next step would be automatically changing data placement to adjust to the load and distribution of data - our future plans which I will touch upon as well.

Publicité
Publicité

Plus De Contenu Connexe

Similaire à Raft After ScyllaDB 5.2: Safe Topology Changes (20)

Plus par ScyllaDB (20)

Publicité

Plus récents (20)

Raft After ScyllaDB 5.2: Safe Topology Changes

  1. 1. Raft After ScyllaDB 5.2: Safe Topology Changes Konstantin Osipov, Director, Software Engineering
  2. 2. Konstantin Osipov ■ Worked on lightweight transactions in ScyllaDB ■ Crazy about distributed systems testing ■ Muscovite and a father of two Your photo goes here, smile :)
  3. 3. ■ ScyllaDB 5.2 ■ Topology on Raft ■ Tablet outlook Presentation Agenda
  4. 4. Previous Episodes
  5. 5. Problem Overview
  6. 6. Strong vs Eventual Consistency Strong consistency Node 1 Node 2 1. Write from client 4. Acknowledged to client 2. Write propagated through cluster 3.Internal acknowledgement Eventual consistency Node 1 Node 2 1. Write from client 2. Acknowledged to client 3. Eventual write propagation ● requires a live majority ● always returns latest write ● highly available ● writes must commute
  7. 7. Data vs metadata - metadata - data Schema information: table, view, type definitions Topology information: nodes, tokens Static and dynamic rows, counters Replicated everywhere Partitioned Not commutative Commutative Changes rarely Changes frequently Consistency of Metadata 1 2 3 3 1 2 replication_factor=2 ScyllaDB cluster
  8. 8. Raft for Metadata Replication Consensus module State machine Log x←1 y←2 z←3 Consensus module State machine Log x←1 y←2 z←3 Consensus module State machine Log x←1 y←2 z←3 Node A Node B Node C
  9. 9. ScyllaDB 5.2: Raft GA
  10. 10. 5.2 Delivers: ■ --consistent-cluster-management ■ Safe schema changes ■ No data loss ■ Fast schema propagation ■ recovery after a loss of majority ■ IP address change support
  11. 11. Enabling Raft ■ 5.2: ON for new clusters ■ 5.3: ON by default ■ In future versions: mandatory # Use Raft to consistently manage schema information in the cluster. # Refer to https://docs.scylladb.com/master/architecture/raft.html for # more details. consistent_cluster_management: true
  12. 12. Recovery After a Lost Majority ■ isolate dead members ■ set the recovery key in system.scylla_local ■ restart each node ■ truncate Raft state, cear the recovery key ■ restart to assemble new cluster
  13. 13. Changing IPs Region 1 Availability zone 1 Availability zone 2 Availability zone 3 100.1.1.1 100.1.153.7 100.1.1.3 100.1.153.4 100.2.1.1 100.2.1.2 100.2.1.3 100.2.1.9 100.3.1.1 100.3.1.2 100.3.1.13 100.3.1.4
  14. 14. Changing IPs ■ Restart to change IPs ■ If all IPs change, update the seeds
  15. 15. Key Tenets of Raft Based Topology
  16. 16. ■ Concurrent topology operations may corrupt your cluster ■ Even when bootstrapping a fresh cluster ■ A failure during topology change may reduce consistency ■ Topology operations take time ■ Even if streaming is quick Limitations We Aim to Address
  17. 17. ■ Raft group includes all cluster members ■ Token metadata is replicated using Raft ■ No stale topology Moving Topology Data to Raft
  18. 18. ■ Runs alongside Raft leader ■ Highly available ■ Drives the progress ■ Performs linearizable reads and writes of token metadata ■ Request coordinators still use the local view on topology ■ No extra coordination when executing user requests The Centralized Coordinator
  19. 19. node A bootstrap bootstrap Linearizable Token Metadata node B node C system.token_metadata Read barrier Read barrier
  20. 20. Automatic Coordinator Failover
  21. 21. ■ A key change in read/write path is fencing ■ Each write is signed with topology version ■ If there is a version mismatch, the write doesn’t go through Change in the Data Plane Replica Coordinator Topology coordinator R e q ( V 1 ) Req(V 1 ) F e n c e ( V 1 ) D r a i n ( V 1 ) Req(V 2 )
  22. 22. ■ Requesting multiple operations concurrently is safe ■ Sanity checks for operator error ■ Failed operations are aborted automatically ■ Faster topology changes The Takeaways
  23. 23. The Journey Continues
  24. 24. ■ system.broadcast_kv_store ■ --experimental in 5.2 UPDATE system.broadcast_kv_store SET value = {new_value} WHERE key = {key} [IF value = {value_condition}]; SELECT value WHERE key = {key} FROM system.broadcast_kv_store; Strongly Consistent Tables
  25. 25. ScyllaDB Journey to Tablets Raft Safe schema changes Safe topology changes Dynamic partitioning Consistent tables Tablets 5.0 5.2 5.3
  26. 26. Thank You Stay in Touch Konstantin Osipov kostja@scylladb.com kostja_osipov kostja kostja

×