ScyllaDB’s drive towards strongly consistent features continues, and in this talk I will cover the upcoming implementation of safe topology changes feature: our rethinking of adding and removing nodes to a Scylla cluster.
Quickly assembling a fresh cluster, performing topology and schema changes concurrently, quickly restarting a node with a different IP address or configuration – all of this has become possible thanks to a centralized - yet fault-tolerant - topology change coordinator, the new algorithm we implemented for Scylla 5.3. The next step would be automatically changing data placement to adjust to the load and distribution of data - our future plans which I will touch upon as well.
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Raft After ScyllaDB 5.2: Safe Topology Changes
1. Raft After ScyllaDB 5.2:
Safe Topology Changes
Konstantin Osipov, Director, Software Engineering
2. Konstantin Osipov
■ Worked on lightweight transactions in ScyllaDB
■ Crazy about distributed systems testing
■ Muscovite and a father of two
Your photo
goes here,
smile :)
3. ■ ScyllaDB 5.2
■ Topology on Raft
■ Tablet outlook
Presentation Agenda
6. Strong vs Eventual Consistency
Strong consistency
Node 1 Node 2
1. Write from
client
4. Acknowledged
to client
2. Write propagated
through cluster
3.Internal
acknowledgement
Eventual consistency
Node 1 Node 2
1. Write from
client
2. Acknowledged
to client
3. Eventual write
propagation
● requires a live majority
● always returns latest write
● highly available
● writes must commute
7. Data vs metadata
- metadata - data
Schema information: table,
view, type definitions
Topology information:
nodes, tokens
Static and dynamic rows,
counters
Replicated everywhere Partitioned
Not commutative Commutative
Changes rarely Changes frequently
Consistency of Metadata
1
2 3
3
1 2
replication_factor=2
ScyllaDB cluster
8. Raft for Metadata Replication
Consensus
module
State
machine
Log
x←1 y←2 z←3
Consensus
module
State
machine
Log
x←1 y←2 z←3
Consensus
module
State
machine
Log
x←1 y←2 z←3
Node A Node B Node C
11. Enabling Raft
■ 5.2: ON for new clusters
■ 5.3: ON by default
■ In future versions: mandatory
# Use Raft to consistently manage schema information in the cluster.
# Refer to https://docs.scylladb.com/master/architecture/raft.html for
# more details.
consistent_cluster_management: true
12. Recovery After a Lost Majority
■ isolate dead members
■ set the recovery key in system.scylla_local
■ restart each node
■ truncate Raft state, cear the recovery key
■ restart to assemble new cluster
13. Changing IPs
Region 1
Availability zone 1 Availability zone 2 Availability zone 3
100.1.1.1 100.1.153.7
100.1.1.3 100.1.153.4
100.2.1.1 100.2.1.2
100.2.1.3 100.2.1.9
100.3.1.1 100.3.1.2
100.3.1.13 100.3.1.4
16. ■ Concurrent topology operations may corrupt your cluster
■ Even when bootstrapping a fresh cluster
■ A failure during topology change may reduce consistency
■ Topology operations take time
■ Even if streaming is quick
Limitations We Aim to Address
17. ■ Raft group includes all cluster members
■ Token metadata is replicated using Raft
■ No stale topology
Moving Topology Data to Raft
18. ■ Runs alongside Raft leader
■ Highly available
■ Drives the progress
■ Performs linearizable reads and writes of token metadata
■ Request coordinators still use the local view on topology
■ No extra coordination when executing user requests
The Centralized Coordinator
21. ■ A key change in read/write path is fencing
■ Each write is signed with topology version
■ If there is a version mismatch, the write doesn’t go through
Change in the Data Plane
Replica
Coordinator
Topology
coordinator
R
e
q
(
V
1
)
Req(V 1
)
F
e
n
c
e
(
V
1
)
D
r
a
i
n
(
V
1
)
Req(V 2
)
22. ■ Requesting multiple operations concurrently is safe
■ Sanity checks for operator error
■ Failed operations are aborted automatically
■ Faster topology changes
The Takeaways
24. ■ system.broadcast_kv_store
■ --experimental in 5.2
UPDATE system.broadcast_kv_store
SET value = {new_value}
WHERE key = {key} [IF value = {value_condition}];
SELECT value
WHERE key = {key}
FROM system.broadcast_kv_store;
Strongly Consistent Tables