Cassandra 2.1 boot camp, Compaction

Agenda
● Overview
● Compaction strategies
● Tombstones
● Code walkthrough

Why?
● SSTables immutable
● Get rid of duplicate/overwritten data
● Drop deleted data and tombstones

When?
● Manually, nodetool compact / scrub ...
● When we add sstables
○ After flush
○ Once a compaction is done
○ After streaming
● Search for usages of
○ o.a.c.db.compaction.
CompactionManager#submitBackground

Types of compaction
● Minor - runs automatically in the background
● Major - includes all sstables, only for size tiered
compaction
● Single-sstable compactions
○ upgradesstables
○ scrub
○ cleanup
● Anticompaction
○ After incremental repair to split out repaired/unrepaired data

Compaction strategies
● Pluggable interface
● Strategies decide
○ what sstables to compact
○ how big they should be
○ what implementation of CompactionTask to use
● Strategies can get notified when adding new sstables
○ Makes it possible to make smart decisions when deciding which
sstables to compact
○ LCS does this to keep track of what sstables are in each level

SizeTieredCompactionStrategy
● Combines sstables based on their size
● Skips sstables that are ‘cold’ - not read much

LeveledCompactionStrategy
● Keeps levels of non-overlapping sstables
● Each level is 10x the size of the previous one
● All sstables in levels 1+ are about the same size
(160MB)
● L0 is the dumping ground, overlapping, larger sstables

Tombstones
● Write a tombstone to delete data
● Covers data, but only data that is older than
the tombstone
● Drop covered data during compaction

When can we drop tombstones?
● Once the tombstone has existed
gc_grace_seconds
● When the tombstone is guaranteed to not
cover any data on the node
○ All sstables containing the key are included in the
compaction
○ The other sstables where the key exists only contain
newer data

CompactionManager
● submitBackground
○ Trigger minor compaction
○ Fill executor with BackgroundCompactionTasks
● BackgroundCompactionTask
● submitMaximal
○ Major compaction
○ Not blocking, get() the future to block
○ runWithCompactionsDisabled
● OneSSTableOperation
○ Common way to run the single-sstable compactions in parallel

CompactionTask
● Gets executed in the
CompactionExecutor and does the actual
compacting
● Eventually calls runWith(..) which is
where the magic happens

CompactionController
● Keep track of overlapping sstables
○ Is the currently compacting key in any other sstable?
● maxPurgeableTimestamp(DecoratedKey key)
○ How old tombstones do we need to keep?
○ Worst case, currently compacting key is the oldest in that sstable

SSTableRewriter
● Open compaction results early

SSTableWriter
● Writes sstables…
● Give it rows, it writes index, data file, sstable metadata
files etc
● openEarly(..)
○ link index and data files
○ in-memory-fake the rest of the files
● Collect SSTable metadata

SSTable metadata
● Collected whenever an sstable is written
● StatsMetadata
○ Kept on-heap
○ min/maxTimestamp
○ min/maxColumnNames
○ sstableLevel
● CompactionMetadata
○ Deserialized when needed
○ ancestors
○ cardinalityEstimator - HyperLogLog signature
● ValidationMetadata
○ Used to validate sstables when opening

Iterators all the way down
a 1 2 3
a 2 5 7
b 2 3 5
b 2 4 5
d .. .. ..
e .. .. ..
a 1 2 3 5 7
b 2 3 4 5
d .. .. .. .. ..
e .. .. .. .. ..
● “Partition iterator” for each sstable
(SSTableScanner)
● “Cell iterator” for each partition
(OnDiskAtomIterator)
● MergeIterator (MI) that takes a number
of (sorted) iterators and merges them
● One MI for sstables that merges
partitions
● One MI for each partition that merges
cells

MergeIterator
● Interesting implementation is ManyToOne
● Merges many sorted iterators into one
● Reducer
○ reduce(..) gets called for every version
that should be reduced
○ getReduced() gets called when all
versions with the same
name/priority/value has been reduce():ed

MergeIterator
1. call next()
2. poll one item out of the PQ
3. Reducer.reduce(..)
4. goto 2, until we find an item
that differs
5. Call next() on the iterators
you polled
6. Re-add the iterators to the PQ
7. return Reducer.getReduced

CompactionIterable
● Creates LazilyCompactedRow
● Simple Reducer

LazilyCompactedRow
● “Lazy” because we don’t deserialize until we
need to
● Uses a MergeIterator to merge the rows
● Drops tombstones if possible
○ Uses CompactionController for this

Cassandra 2.1 boot camp, Compaction

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (7)

Similaire à Cassandra 2.1 boot camp, Compaction

Similaire à Cassandra 2.1 boot camp, Compaction (20)

Dernier

Dernier (20)

Cassandra 2.1 boot camp, Compaction