[2024]Digital Global Overview Report 2024 Meltwater.pdf
Cassandra TK 2014 - Large Nodes
1. CASSANDRA TK 2014
LARGE NODES WITH
CASSANDRA
Aaron Morton
@aaronmorton
!
Co-Founder & Principal Consultant
www.thelastpickle.com
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
2. About The Last Pickle.
Work with clients to deliver and improve
Apache Cassandra based solutions.
Apache Cassandra Committer, DataStax MVP,
Hector Maintainer, Apache Usergrid
Committer.
Based in New Zealand & USA.
9. Bloom Filter
Stores bitset used to determine if a key exists
in an SSTable with a certain probability.
!
Size depends on number of rows and
bloom_filter_fp_chance.
11. Bloom Filter Size
0.01 bloom_filter_fp_chance
0.10 bloom_filter_fp_chance
Bloom Filer Size in MB
1,200
900
600
300
0
1
10
100
Millions of Rows
1,000
12. Compression Metadata
Stores long offset into compressed Data.db file for each chunk_length_kb
(default 64) of uncompressed data.
!
Size depends on the uncompressed data size.
19. Bootstrap.
The joining node requests data from one
replica of each token range it will own.
!
Sending is throttled by
stream_throughput_outbound_mega
bits_per_sec (default 200/25MB).
20. Bootstrap.
With RF 3, only three nodes will send data to
a bootstrapping node.
!
Maximum send rate is 75 MB/sec (3*25MB).
21. Moving Nodes.
Copy data from existing node to new node.
!
At 50 MB/s transferring 100GB takes 33
minutes.
27. Comparing Data for Repair.
Calculate Merkle Tree hash by reading all
rows in a Table.
(Validation Compaction)
!
Single comparator, throttled by
compaction_throughput_mb_per_sec
(default 16).
28. Comparing Data for Repair.
Time taken grows as the size of the data per
node grows.
29. Exchanging Data for Repair.
Ranges of rows with differences are
Streamed.
!
Sending is throttled by
stream_throughput_outbound_mega
bits_per_sec (default 200/25MB).
39. Moving Node Work Arounds.
Copy nodetool snapshot while the
original node is operational.
!
Copy only a delta when the original node is
stopped.
40. Disk Management Work Arounds.
Use RAID-0 and over provision nodes
anticipating failure.
!
Use RAID-10 and accept additional costs.
41. Repair Work Arounds.
Only use if data is deleted, rely on
Consistently Level for distribution.
!
Frequent small repair using token ranges.
42. Compaction Work Arounds.
Over provision disk capacity when using
SizeTieredCompactionStrategy.
!
Reduce min_compaction_threshold (default
4) max_compaction_threshold (default 32) to
reduce number of SSTables per compaction.
45. Memory Management Improvements.
Version 1.2 moved Bloom Filters and
Compression Meta Data off the JVM Heap to
Native Memory.
!
Version 2.0 moved Index Samples off the JVM
Heap.
46. Bootstrap Improvements.
Virtual Nodes increases the number of Token
Ranges per node from 1 to 256.
!
Bootstrapping node can request data from
256 different nodes.