Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Cassandra compaction

4 430 vues

Publié le

Publié dans : Technologie
  • Soyez le premier à commenter

Cassandra compaction

  1. 1. What is Compaction? Kazutaka Tomita (INTHEFOREST Co., Ltd.)
  2. 2. Who is this guy? Kazutaka Tomita (@railute) • INTHEFOREST Co., Ltd. CEO/CTO • Consulting for Apache Cassandra and Apache Spark Systems • Supporting for Cassandra in Japan • an organizer of Cassandra Summit JPN Specialty • RDBMS (Oracle,SQLServer,MySQL,PostgreSQL) • Apache Cassandra • Apache Spark • Apache Hadoop with YARN • And other NoSQL • NLP and Text mining for Japanese
  3. 3. Agenda  Overview of Compaction.  Compaction Do.
  4. 4. Overview of Compaction. • Why is the compaction done ? • When is the compaction done? • What type is the compaction? Three points of Cassandra’s Compaction.
  5. 5. Why is the compaction done ? So, We must purge duplicate or overwritten or deleted data and tombstones. The most important thing : The SSTable is immutable.
  6. 6. Writing System for Apache Cassandra for your reference memtable Memory Disk Commit Log Coordinator node Flush SSTable For local 1st NoWriting node is alive. YES Write Hinted Sent messages to other node Writing operation Receive messages from coordinator node 2nd memtable memtable SSTable SSTable Compacion Close YES No Sort by token
  7. 7. When is the compaction done? 1.Manually 2.Running in the background
  8. 8. When is the compaction done? 1.Manually 1. nodetool compact Forces a major compaction on one or more tables. By size tiered compaction, a major compaction combines each of the pools of repaired and unrepaired SSTables into one repaired and one unreparied SSTable. 2. nodetool scrub Rebuild SSTables for one or more Cassandra tables. 3. nodetool cleanup Cleans up keyspaces and partition keys no longer belonging to a node. Use this command to remove unwanted data after adding a new node to the cluster. Cassandra does not automatically remove data from nodes that lose part of their partition range to a newly added node. 4. nodetool upgradesstables Rewrites SSTables for tables that are not running the current version of Cassandra.
  9. 9. When is the compaction done? 2. Running in the background 1.daemon started 2.after flashing memtables 3.after streaming 4.enable auto compaction by nodetool 5.set compaction threshold by nodetool
  10. 10. What type is the compaction? 1. Minor 2. Major 3. Single-sstable compactions 4. Anti compaction
  11. 11. What type is the compaction? 1. Minor This compaction runs automatically in the background. • daemon started • after flashing memtables • after streaming
  12. 12. What type is the compaction? 2. Major This compaction is only called by size tiered compaction. cf.) org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy#getMaximalTask The Other compaction is called by “nodetool compact”, but major compaction is not executed. *n. minor compaction is executed. cf.) org.apache.cassandra.db.compaction.DateTieredCompactionStrategy#getMaximalTask org.apache.cassandra.db.compaction.LeveledCompactionStrategy#getMaximalTask
  13. 13. What type is the compaction? 3. Single-sstable compactions This Compaction is executed one by one every SSTable. nodetool upgradesstables nodetool scrub nodetool cleanup
  14. 14. What type is the compaction? 4. Anti compaction This Compaction is for incremental repairs. After executing incremantal repairs, An anticompaction is called. *After 2.1
  15. 15. Compaction Strategy 1. SizeTieredCompactionStrategy For write-intensive workloads 2. LeveledCompactionStrategy For read-intensive workloads 3. DateTieredCompactionStrategy For time series data and expiring (TTL) data
  16. 16. Size Tiered Compaction Strategy When Some SSTables became the similar size, they are merged. (default is 4.) SSTable SSTable SSTable SSTable SSTable SSTable SSTable SSTable SSTable
  17. 17. Leveled Compaction Strategy SSTable SSTable SSTable SSTable SSTable SSTableLebel0 SSTableLebel1 SSTable SSTable SSTableLebel2 SSTable The data which isn't read so much.
  18. 18. DateTieredCompactionStrategy Default:1hour The basic idea of DTCS is to group SSTables in windows based on how old the data is in the SSTable. sstable sstable sstable sstable sstable windows windows now sstable 4 sstables 4 sstables
  19. 19. Merge SSTable by Compaction When Some SSTables became the similar size, they are merged. (default is 4.) Name: John Address: Osaka Address: Tokyo Tel: xxx-xxx ages: 20 Name: John Address: Tokyo ages: 20