SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
Compaction
Agenda 
● Overview 
● Compaction strategies 
● Tombstones 
● Code walkthrough
Why? 
● SSTables immutable 
● Get rid of duplicate/overwritten data 
● Drop deleted data and tombstones
When? 
● Manually, nodetool compact / scrub ... 
● When we add sstables 
○ After flush 
○ Once a compaction is done 
○ After streaming 
● Search for usages of 
○ o.a.c.db.compaction. 
CompactionManager#submitBackground
Types of compaction 
● Minor - runs automatically in the background 
● Major - includes all sstables, only for size tiered 
compaction 
● Single-sstable compactions 
○ upgradesstables 
○ scrub 
○ cleanup 
● Anticompaction 
○ After incremental repair to split out repaired/unrepaired data
Compaction strategies 
● Pluggable interface 
● Strategies decide 
○ what sstables to compact 
○ how big they should be 
○ what implementation of CompactionTask to use 
● Strategies can get notified when adding new sstables 
○ Makes it possible to make smart decisions when deciding which 
sstables to compact 
○ LCS does this to keep track of what sstables are in each level
SizeTieredCompactionStrategy 
● Combines sstables based on their size 
● Skips sstables that are ‘cold’ - not read much
LeveledCompactionStrategy 
● Keeps levels of non-overlapping sstables 
● Each level is 10x the size of the previous one 
● All sstables in levels 1+ are about the same size 
(160MB) 
● L0 is the dumping ground, overlapping, larger sstables
Tombstones 
● Write a tombstone to delete data 
● Covers data, but only data that is older than 
the tombstone 
● Drop covered data during compaction
When can we drop tombstones? 
● Once the tombstone has existed 
gc_grace_seconds 
● When the tombstone is guaranteed to not 
cover any data on the node 
○ All sstables containing the key are included in the 
compaction 
○ The other sstables where the key exists only contain 
newer data
Code walkthrough
CompactionManager 
● submitBackground 
○ Trigger minor compaction 
○ Fill executor with BackgroundCompactionTasks 
● BackgroundCompactionTask 
● submitMaximal 
○ Major compaction 
○ Not blocking, get() the future to block 
○ runWithCompactionsDisabled 
● OneSSTableOperation 
○ Common way to run the single-sstable compactions in parallel
CompactionTask 
● Gets executed in the 
CompactionExecutor and does the actual 
compacting 
● Eventually calls runWith(..) which is 
where the magic happens
CompactionTask
CompactionController 
● Keep track of overlapping sstables 
○ Is the currently compacting key in any other sstable? 
● maxPurgeableTimestamp(DecoratedKey key) 
○ How old tombstones do we need to keep? 
○ Worst case, currently compacting key is the oldest in that sstable
SSTableRewriter 
● Open compaction results early
SSTableWriter 
● Writes sstables… 
● Give it rows, it writes index, data file, sstable metadata 
files etc 
● openEarly(..) 
○ link index and data files 
○ in-memory-fake the rest of the files 
● Collect SSTable metadata
SSTable metadata 
● Collected whenever an sstable is written 
● StatsMetadata 
○ Kept on-heap 
○ min/maxTimestamp 
○ min/maxColumnNames 
○ sstableLevel 
● CompactionMetadata 
○ Deserialized when needed 
○ ancestors 
○ cardinalityEstimator - HyperLogLog signature 
● ValidationMetadata 
○ Used to validate sstables when opening
Iterators all the way down 
a 1 2 3 
a 2 5 7 
b 2 3 5 
b 2 4 5 
d .. .. .. 
e .. .. .. 
a 1 2 3 5 7 
b 2 3 4 5 
d .. .. .. .. .. 
e .. .. .. .. .. 
● “Partition iterator” for each sstable 
(SSTableScanner) 
● “Cell iterator” for each partition 
(OnDiskAtomIterator) 
● MergeIterator (MI) that takes a number 
of (sorted) iterators and merges them 
● One MI for sstables that merges 
partitions 
● One MI for each partition that merges 
cells
MergeIterator 
● Interesting implementation is ManyToOne 
● Merges many sorted iterators into one 
● Reducer 
○ reduce(..) gets called for every version 
that should be reduced 
○ getReduced() gets called when all 
versions with the same 
name/priority/value has been reduce():ed
MergeIterator 
1. call next() 
2. poll one item out of the PQ 
3. Reducer.reduce(..) 
4. goto 2, until we find an item 
that differs 
5. Call next() on the iterators 
you polled 
6. Re-add the iterators to the PQ 
7. return Reducer.getReduced
CompactionIterable 
● Creates LazilyCompactedRow 
● Simple Reducer
LazilyCompactedRow 
● “Lazy” because we don’t deserialize until we 
need to 
● Uses a MergeIterator to merge the rows 
● Drops tombstones if possible 
○ Uses CompactionController for this

Contenu connexe

Tendances

Rook: Storage for Containers in Containers – data://disrupted® 2020
Rook: Storage for Containers in Containers  – data://disrupted® 2020Rook: Storage for Containers in Containers  – data://disrupted® 2020
Rook: Storage for Containers in Containers – data://disrupted® 2020
data://disrupted®
 
CloudModule for Zabbix
CloudModule for ZabbixCloudModule for Zabbix
CloudModule for Zabbix
Daisuke Ikeda
 

Tendances (20)

Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt
 
Integrating GlusterFS with iSCSI Target
Integrating GlusterFS with iSCSI TargetIntegrating GlusterFS with iSCSI Target
Integrating GlusterFS with iSCSI Target
 
CRIU: time and space travel for Linux containers -- Kir Kolyshkin
CRIU: time and space travel for Linux containers -- Kir KolyshkinCRIU: time and space travel for Linux containers -- Kir Kolyshkin
CRIU: time and space travel for Linux containers -- Kir Kolyshkin
 
Barcamp presentation
Barcamp presentationBarcamp presentation
Barcamp presentation
 
Is It Faster to Go with Redpanda Transactions than Without Them?!
Is It Faster to Go with Redpanda Transactions than Without Them?!Is It Faster to Go with Redpanda Transactions than Without Them?!
Is It Faster to Go with Redpanda Transactions than Without Them?!
 
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
 
OpenNebulaConf2018 - OpenNebula and LXD Containers - Rubén S. Montero - OpenN...
OpenNebulaConf2018 - OpenNebula and LXD Containers - Rubén S. Montero - OpenN...OpenNebulaConf2018 - OpenNebula and LXD Containers - Rubén S. Montero - OpenN...
OpenNebulaConf2018 - OpenNebula and LXD Containers - Rubén S. Montero - OpenN...
 
DRBD + OpenStack (Openstack Live Prague 2016)
DRBD + OpenStack (Openstack Live Prague 2016)DRBD + OpenStack (Openstack Live Prague 2016)
DRBD + OpenStack (Openstack Live Prague 2016)
 
OpenNebulaConf2015 2.03 Docker-Machine and OpenNebula - Jaime Melis
OpenNebulaConf2015 2.03 Docker-Machine and OpenNebula - Jaime MelisOpenNebulaConf2015 2.03 Docker-Machine and OpenNebula - Jaime Melis
OpenNebulaConf2015 2.03 Docker-Machine and OpenNebula - Jaime Melis
 
Leveraging AWS
Leveraging AWSLeveraging AWS
Leveraging AWS
 
Rook: Storage for Containers in Containers – data://disrupted® 2020
Rook: Storage for Containers in Containers  – data://disrupted® 2020Rook: Storage for Containers in Containers  – data://disrupted® 2020
Rook: Storage for Containers in Containers – data://disrupted® 2020
 
Memory management
Memory managementMemory management
Memory management
 
Talk on PHP Day Uruguay about Docker
Talk on PHP Day Uruguay about DockerTalk on PHP Day Uruguay about Docker
Talk on PHP Day Uruguay about Docker
 
Comparing Orchestration
Comparing OrchestrationComparing Orchestration
Comparing Orchestration
 
CloudModule for Zabbix
CloudModule for ZabbixCloudModule for Zabbix
CloudModule for Zabbix
 
High Performance Scaling Techniques in Golang Using Go Assembly
High Performance Scaling Techniques in Golang Using Go AssemblyHigh Performance Scaling Techniques in Golang Using Go Assembly
High Performance Scaling Techniques in Golang Using Go Assembly
 
Running OpenStack in Production - Barcamp Saigon 2016
Running OpenStack in Production - Barcamp Saigon 2016Running OpenStack in Production - Barcamp Saigon 2016
Running OpenStack in Production - Barcamp Saigon 2016
 
The Concierge Paradigm
The Concierge ParadigmThe Concierge Paradigm
The Concierge Paradigm
 
Improving hyperconverged performance
Improving hyperconverged performanceImproving hyperconverged performance
Improving hyperconverged performance
 
Node in Real Time - The Beginning
Node in Real Time - The BeginningNode in Real Time - The Beginning
Node in Real Time - The Beginning
 

En vedette

En vedette (7)

Cassandra compaction
Cassandra compactionCassandra compaction
Cassandra compaction
 
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
 
Compaction, Compaction Everywhere
Compaction, Compaction EverywhereCompaction, Compaction Everywhere
Compaction, Compaction Everywhere
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
 
Cassandra compaction
Cassandra compactionCassandra compaction
Cassandra compaction
 
Cassandra model
Cassandra modelCassandra model
Cassandra model
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 

Similaire à Cassandra 2.1 boot camp, Compaction

Hadoop and cassandra
Hadoop and cassandraHadoop and cassandra
Hadoop and cassandra
Christina Yu
 

Similaire à Cassandra 2.1 boot camp, Compaction (20)

An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache Cassandra
 
Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodb
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt AhrensOpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
 
Manticore 6.pdf
Manticore 6.pdfManticore 6.pdf
Manticore 6.pdf
 
Distributed unique id generation
Distributed unique id generationDistributed unique id generation
Distributed unique id generation
 
Hadoop and cassandra
Hadoop and cassandraHadoop and cassandra
Hadoop and cassandra
 
Elasticsearch as a time series database
Elasticsearch as a time series databaseElasticsearch as a time series database
Elasticsearch as a time series database
 
MySQL and MariaDB Backups
MySQL and MariaDB BackupsMySQL and MariaDB Backups
MySQL and MariaDB Backups
 
Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
 
Advanced memory allocation
Advanced memory allocationAdvanced memory allocation
Advanced memory allocation
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on gluster
 
Introduction to redis - version 2
Introduction to redis - version 2Introduction to redis - version 2
Introduction to redis - version 2
 
Bsdtw17: george neville neil: realities of dtrace on free-bsd
Bsdtw17: george neville neil: realities of dtrace on free-bsdBsdtw17: george neville neil: realities of dtrace on free-bsd
Bsdtw17: george neville neil: realities of dtrace on free-bsd
 
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaBuilding a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Dernier (20)

Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

Cassandra 2.1 boot camp, Compaction

  • 2. Agenda ● Overview ● Compaction strategies ● Tombstones ● Code walkthrough
  • 3. Why? ● SSTables immutable ● Get rid of duplicate/overwritten data ● Drop deleted data and tombstones
  • 4. When? ● Manually, nodetool compact / scrub ... ● When we add sstables ○ After flush ○ Once a compaction is done ○ After streaming ● Search for usages of ○ o.a.c.db.compaction. CompactionManager#submitBackground
  • 5. Types of compaction ● Minor - runs automatically in the background ● Major - includes all sstables, only for size tiered compaction ● Single-sstable compactions ○ upgradesstables ○ scrub ○ cleanup ● Anticompaction ○ After incremental repair to split out repaired/unrepaired data
  • 6. Compaction strategies ● Pluggable interface ● Strategies decide ○ what sstables to compact ○ how big they should be ○ what implementation of CompactionTask to use ● Strategies can get notified when adding new sstables ○ Makes it possible to make smart decisions when deciding which sstables to compact ○ LCS does this to keep track of what sstables are in each level
  • 7. SizeTieredCompactionStrategy ● Combines sstables based on their size ● Skips sstables that are ‘cold’ - not read much
  • 8. LeveledCompactionStrategy ● Keeps levels of non-overlapping sstables ● Each level is 10x the size of the previous one ● All sstables in levels 1+ are about the same size (160MB) ● L0 is the dumping ground, overlapping, larger sstables
  • 9. Tombstones ● Write a tombstone to delete data ● Covers data, but only data that is older than the tombstone ● Drop covered data during compaction
  • 10. When can we drop tombstones? ● Once the tombstone has existed gc_grace_seconds ● When the tombstone is guaranteed to not cover any data on the node ○ All sstables containing the key are included in the compaction ○ The other sstables where the key exists only contain newer data
  • 12.
  • 13. CompactionManager ● submitBackground ○ Trigger minor compaction ○ Fill executor with BackgroundCompactionTasks ● BackgroundCompactionTask ● submitMaximal ○ Major compaction ○ Not blocking, get() the future to block ○ runWithCompactionsDisabled ● OneSSTableOperation ○ Common way to run the single-sstable compactions in parallel
  • 14. CompactionTask ● Gets executed in the CompactionExecutor and does the actual compacting ● Eventually calls runWith(..) which is where the magic happens
  • 16. CompactionController ● Keep track of overlapping sstables ○ Is the currently compacting key in any other sstable? ● maxPurgeableTimestamp(DecoratedKey key) ○ How old tombstones do we need to keep? ○ Worst case, currently compacting key is the oldest in that sstable
  • 17. SSTableRewriter ● Open compaction results early
  • 18. SSTableWriter ● Writes sstables… ● Give it rows, it writes index, data file, sstable metadata files etc ● openEarly(..) ○ link index and data files ○ in-memory-fake the rest of the files ● Collect SSTable metadata
  • 19. SSTable metadata ● Collected whenever an sstable is written ● StatsMetadata ○ Kept on-heap ○ min/maxTimestamp ○ min/maxColumnNames ○ sstableLevel ● CompactionMetadata ○ Deserialized when needed ○ ancestors ○ cardinalityEstimator - HyperLogLog signature ● ValidationMetadata ○ Used to validate sstables when opening
  • 20. Iterators all the way down a 1 2 3 a 2 5 7 b 2 3 5 b 2 4 5 d .. .. .. e .. .. .. a 1 2 3 5 7 b 2 3 4 5 d .. .. .. .. .. e .. .. .. .. .. ● “Partition iterator” for each sstable (SSTableScanner) ● “Cell iterator” for each partition (OnDiskAtomIterator) ● MergeIterator (MI) that takes a number of (sorted) iterators and merges them ● One MI for sstables that merges partitions ● One MI for each partition that merges cells
  • 21. MergeIterator ● Interesting implementation is ManyToOne ● Merges many sorted iterators into one ● Reducer ○ reduce(..) gets called for every version that should be reduced ○ getReduced() gets called when all versions with the same name/priority/value has been reduce():ed
  • 22. MergeIterator 1. call next() 2. poll one item out of the PQ 3. Reducer.reduce(..) 4. goto 2, until we find an item that differs 5. Call next() on the iterators you polled 6. Re-add the iterators to the PQ 7. return Reducer.getReduced
  • 23. CompactionIterable ● Creates LazilyCompactedRow ● Simple Reducer
  • 24. LazilyCompactedRow ● “Lazy” because we don’t deserialize until we need to ● Uses a MergeIterator to merge the rows ● Drops tombstones if possible ○ Uses CompactionController for this