What if there was a new, better, more efficient way to handle compactions in Scylla? One that allows you to use your storage much more efficiently? Enter Scylla’s unique Incremental Compaction Strategy (ICS). Get a comparison of common compaction strategies and a technical deep dive into ICS. You’ll learn why ICS will become the new standard for compaction, including an overview of how much disk space you can save with ICS.
2. Presenter
Benny Halevy, Core Storage Group Manager
■ Leads the storage software development team at ScyllaDB.
■ Working on operating systems and distributed file systems for
over 20 years.
■ Before Scylla, led software development for GSI Technology,
providing a hardware/software solution for deep learning and
similarity search using in-memory computing technology.
■ Previously co-founded Tonian (later acquired by Primary Data)
and led it as CTO, developing a distributed file server based on
the pNFS protocol delivering highly scalable performance and
dynamic, out-of-band data placement control.
■ Before Tonian, lead architect in Panasas of the pNFS protocol.
4. ■ Changes to the data are:
● First, recorded in memory, then
● Flushed into SSTables.
■ Updates accumulate over time
● in different SSTables
● Having several versions of the same cell is called
“write amplification”
Log-structured Writes
...
Updates
MemTable
...
SSTable
5. SSTables
■ Immutable
■ Contain changes to data
● A.k.a mutations
■ Sorted (“Sorted Strings Table”)
■ Have metadata, like:
● Index, Statistics, Filter
...
Updates
MemTable
...
SSTable
🛈 There is no static view of the database
6. Reading Data
■ Requires reading all relevant SSTables
● Applying the live mutations
● Bloom filter used to locate those
■ Consolidating mutations from many
SSTables is expensive
● We call that “read amplification”
...
Updates
MemTable
...
SSTable
7. Why is Compaction Needed?
■ SSTables are immutable
● We can’t just keep writing updates
● Obsolete data needs to be deleted
● Reduce write amplification
■ Data may be scattered around
● We want to consolidate it
● Reduce read amplification
...
Updates
MemTable
...
SSTable
8. Compaction Fundamentals
1. Compaction first selects a set of sstables to process.
● based on the Compaction Strategy.
2. It then reads the SSTables, and
● writes the compacted output
● while eliminating overwrites, deleted and expired data.
3. Eventually, when the output SSTables are
sealed and safely stored on storage
● the input SSTables can be finally deleted.
� Note that compaction requires temporary space
Since SSTables must not be deleted until their compaction completes.
9. Compaction Fundamentals
■ Which mutations can be eliminated?
● Overwritten
● Expired (by TTL)
● Deleted (by tombstone / column deletion)
● Droppable tombstones
a’
a
b c
!c
!d
a’ b !c
!z
!d
[a] is overwritten
by [a’]
[b] is newly
written
[c] is deleted
by [!c]
[!d] is a live
tombstone
[!z] is a
droppable
tombstone
poof!
🛈 Note that tombstones are kept around for gc_grace_seconds
until they are garbage-collected, to prevent data resurrection.
10. Legacy Compaction Strategies - STCS
There is a choice of compaction strategies, for different workloads.
ICS is based on the following two common strategies:
■ Size-Tiered Compaction Strategy (STCS)
● STCS organizes SSTables into tiers,
● based on their size,
● on an exponential scale
■ When compacting several SSTables
● A single SSTable is created
● It may be as large as the union of all of them
■ Then it’s moved to the next tier
● Or become much smaller due deletes and
expirations
■ Potentially dropping to a lower tier.
11. STCS Space Amplification
■ STCS requires space of at least twice the data size
■ This is called Space amplification
■ The main factors are:
● Temporary space: during compaction.
● Accumulation of updates and deletes
across different tiers
12. Legacy Compaction Strategies - LCS
Leveled Compaction Strategy (LCS)
■ Compaction is triggered when a level has more than 10i SSTables
■ LCS picks one sstable from level “i”, with size X, to compact
■ it then finds the roughly 10 sstables in the next level
● overlapping with this sstable
● and compacts all of them together
■ It writes the resulting run
● to the next level
● Run size bound by (1+10)*X
13. Legacy Compaction Strategies - LCS
■ While LCS limits space amplification
■ It results in higher write amplification.
15. ICS In a Nutshell
■ We observed problems with legacy compaction strategies:
● STCS has high space amplification (and low write amplification)
● LCS has high write amplification (and low space amplification)
■ We wanted to benefit from both approaches
■ By borrowing SSTable Runs from LCS
■ And applying them over size-tiers
🛈 Merely replacing
● increasingly larger SSTables with
● increasingly longer SSTable Runs
16. SSTable Runs
■ Expansion of the SSTable concept
■ Comprised of a sorted set of SSTables
■ The SSTables are non-overlapping
● Those are called “Fragments”
a
b
...
z
a
b
...
z
🛈 A run is equivalent to
● a large SSTable
● split into several smaller SSTables
17. How ICS Works?
■ Remember that:
● Fragments are disjoint
● and sorted with respect to each other
■ So we scan the runs, fragment-by-fragment
■ and compact them incrementally
● While deleting exhausted SSTables as we go
A
B
...
Z
a
b
...
z
A+a
B+b
A a
B b
A+a
B+b
18. Case Study
Phases:
1. Write 500GB
2. Overwrite repeatedly
3. Compact
■ Clearly shows ICS’
improved space-
amplification
■ Most notably
STCS 2X major peak
is gone!
19. Thank you Stay in touch
Any questions? Benny Halevy
bhalevy@scylladb.com
Notes de l'éditeur
Changes to data are first recorded in memory and also stored on disk in the commit log.
As data updates need to be frequently compacted, along with unchanged data, that is merely copied over and over again.