8. Cassandra Scylla
Throughput: Cannot utilize multi-core efficiently Scales linearly - shard-per-core
Latency: High due to Java and JVM’s GC Low and consistent - own cache
Complexity: Intricate tuning and configuration Auto tuned, dynamic scheduling
Admin: Maintenance impacts performance SLA guarantee for admin vs serving
9. Scylla Scales UP and OUT
Ingestion time. Every point doubles node size and data per node.
Total data size per node in the i3.16xlarge case is 4.8TB.
1B rows 2B rows 4B rows 8B rows 16B rows
time to ingest
10. Scylla Scales UP and OUT
nodetool compact from quiescent state. Each point doubles node size and data per node
4.8TB i3.16xlarge: 2:11:34
4.8TB2.4TB1.2TB0.6TB0.3TB
Time to fully compact the node
11. “Nodes must be small in case they fail”
11
+ No, they don’t.
+ Same clusters as previous experiments.
+ Destroy compacted node, rebuild from remaining two.
1B rows 2B rows 4B rows 8B rows 16B rows
4.8TB2.4TB1.2TB0.6TB0.3TB
13. About AdGear Samsung Ads
1. AdTech (Advertising Technology) space
2. Started ~10 years ago here in Montreal
▪ Classical Publisher and Advertiser use cases
▪ “Big Data” 250-5k ad impressions / second
3. Then added RTB (Real-Time-Bidding) functionality
▪ Classical buyer/seller use cases
▪ “Big Data” 1M+ transactions / second
4. Then acquired by Samsung VD (Visual Display) while forming
Samsung Ads
▪ Classical hardware manufacturer
▪ Unique “Big Data” and opportunities
15. RTB: Value in execution based on data
asymmetry
bob: previously purchased a $4k bike
bob: habitually watches cycling races
bob: is male
bob: db timeout
16. Requirements for that database:
1. Key-value(s) store
2. Low-latency reads. Single milliseconds or less
3. High-throughput to keep up with the rest of the stack volume
4. Horizontal scalability
5. Multi-DC by design
6. Behaves well under mixed concurrent loads:
a. Point Reads X Point Writes X Bulk Writes
17. Apache Cassandra at AdGear
1. Used Cassandra since 2010 (v0.6) on sun-jdk (1.6)
a. Those were the days of many operational “WTFs” and gnashing of
teeth
i. Fun fact! That JVM enters 100% CPU usage on leap second adjustments!
b. But it worked fairly well all things considered
2. Cassandra matured as our company matured:
a. Now with VTokens like described in the Dynamo Paper. Yay!
b. Now with LevelDB-like compaction strategy. Yay!
c. Now with off-heap low-GC-cost data structures. Yay!
d. Now with G1Gc on by default. Yay!
e. Now with forked community vs enterprise roadmap.. Yay?
18. 2017 Tipping Point
Cassandra:
• Slowly losing the latency battle
• Node proliferation
• Load-induced deep JVM bugs
beyond our capacity to debug ->
instability
• Not particularly interested in
enterprise-packaged version of
the above
What to do:
• What are modern alternatives ?
• Have you guys heard of ScyllaDB
? Seen them pop up a few times
• Willing to help POC with great
engineering guidance!
• Marketed as:
▪ service cassandra stop
▪ service scylladb start
19. 2017 Scylla DB at AdGear
Cassandra Scylla
Servers 31 16
Read latency ~21ms <5ms
Backlog and timeouts As high as 15% at peak
☹
~0
27. Close to the hardware
• Our own memory allocator
• Our own Disk I/O Scheduler
• Our own CPU Scheduler
• Our own cache, bypasses Linux entirely.
27
28. The Autonomous NoSQL Database
28
• SLA for Requests over maintenance operations
• Automatic tuning
• Automatic backpressure
• Scale up/down easily and stream as fast as possible
• Ongoing repair
• Smoothes complex data models
29. Throughput is EASY
29
• Maybe costly, but easy
• Bruce Wayne can get any throughput he wants from any modern
NoSQL, including Cassandra.
30. Throughput is EASY
30
• Maybe costly, but easy
• Bruce Wayne can get any throughput he wants from any modern
NoSQL, including Cassandra.
LATENCY IS HARD
31. Dear Scylla,
31
What do you call a latency distribution for which the high percentiles
are much higher than the average?
32. Dear Scylla,
32
What do you call a latency distribution for which the high percentiles
are much higher than the average?
34. How fast is my system?
▪ There are two speeds:
o Disk Speed
o CPU/memory speed
▪ What happens when they are not in sync ?
latency mean : 51.9
latency median : 9.8
latency 95th percentile : 125.6
latency 99th percentile : 1184.0
latency 99.9th percentile : 1991.2
34
35. How fast is my system?
▪ There are two speeds:
o Disk Speed
o CPU/memory speed
▪ What happens when they are not in sync ?
latency mean : 51.9
latency median : 9.8
latency 95th percentile : 125.6
latency 99th percentile : 1184.0 (x 22)
latency 99.9th percentile : 1991.2 (x 38)
35
36. The Wall - where is it relevant?
▪ Disk speed slower than CPU speed
o plain slow disk, large payloads
36
37. The Wall - where is it relevant?
▪ Disk speed slower than CPU speed
o plain slow disk, large payloads
▪ Any other mismatch between resources
o For example, large memory capped by narrow network
37
42. Tasks in Scylla
42
Traditional stack Scylla’s stack
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise is a
pointer to
eventually
computed value
Task is a
pointer to a
lambda function
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread is a
function pointer
Stack is a byte
array from 64k
to megabytes
43. The task quota
▪ How often do we check the work queues?
▪ Pre-2.0 defaults too high for latency bound systems
▪ Tasks not respecting it will cause spikes
43
44. The task quota
▪ How often do we check the work queues?
▪ Pre-2.0 defaults too high for latency bound systems
▪ Tasks not respecting it will cause spikes
44
47. The I/O Scheduler
47
• Major component of Scylla since early versions
▪ Central component in The Wall
▪ Getting major improvements for latency workloads in Scylla 2.3
48. The CPU Scheduler
48
• Since Scylla 2.0, initial version
▪ disabled by default, AdGear enables it.
▪ enabled in our AWS AMI if using i3 instances.
• 2.2 ships with the full solution
▪ Ships this week!
▪ Enabled by default everywhere.
▪ Much better isolation
58. The controllers - coming soon
58
• Scylla 2.2: SizeTiered compactions are controlled.
• Scylla 2.3: All compaction strategies are controlled.
• Repairs
▪ Repairs already respect latencies very well, but are not as fast as
they could be. Controllers will help unleash their full potential
▪ Done: Scylla Enterprise Manager schedules repairs automatically, no
human involvement needed
59. Summary
59
• Scylla inherits the user-visible architecture from Cassandra, a
solution that is known to scale up very well
• Scylla employs a radically different internal architecture, allowing
it to scale up as well as out while keeping latencies predictable
• Scylla reduces TCO across the board, by also minimizing
operational expenses.