Throughput and latency are at a constant tension. ScyllaDB CTO and co-founder Avi Kivity will show how high throughput and low latency can both be achieved in a single application by using application-level priority scheduling.
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Keeping Latency Low and Throughput High with Application-level Priority Management
1. Brought to you by
Keeping Latency Low and Throughput
High with Application-level
Priority Management
Avi Kivity
CTO at
2. Avi Kivity
CTO at ScyllaDB
Creator and ex-maintainer of Kernel-based Virtual Machine (KVM)
Creator of the Seastar I/O framework
Co-founder, CTO @ ScyllaDB
3. Comparing throughput and latency
Throughput computing (~ OLAP)
■ Want to maximize utilization
■ Extensive buffering to hide
device/network latency
■ Total time is important
■ Fewer operations, serialization is
permissible
Latency computing (~ OLTP)
■ Leave free cycles to absorb
bursts
■ Cannot predict data to read
Often must synchronously write
■ Individual operation time is
important
■ Many operations execute
concurrently
4. Why mix throughput and latency computing?
■ Run different workloads on the same data - HTAP
● Fewer resources than dedicated clusters
■ Maintenance operations on an OLTP workload
● Garbage collection
● Grooming a Log-Structured Merge Tree (LSM Tree)
● Cluster maintenance - add/remove/rebuild/backup/scrub nodes
6. Isolating tasks in threads
■ Each operation becomes a thread
● Perhaps temporarily borrowed from a thread pool
■ Let the kernel schedule these threads
■ Influence kernel choices with priority
7. Isolating tasks in threads
Advantages
■ Well understood
■ Large ecosystem
Disadvantages
■ Context switches are expensive
■ Communicating priority to the OS is
hard
● Priority levels not meaningful
■ Locking becomes complex and
expensive
■ Priority inversion is possible
■ Kernel scheduling granularity may be
too high
8. Application-level task isolation
■ Every operation is a normal object
■ Operations are multiplexed on a small number of threads
● Ideally one thread per logical core
● Both throughput and latency tasks on the same thread!
■ Concurrency framework assigns tasks to threads
■ Concurrency framework controls order
9. Application-level task isolation
Advantages
■ Full control
■ Low overhead with cooperative scheduling
■ Many locks become unnecessary
■ Good CPU affinity
■ Fewer surprises from the kernel
Disadvantages
■ Full control
■ Less mature ecosystem
12. Switching queues
■ When queue is exhausted
● Common for latency sensitive queues
■ When time slice is exhausted
● Throughput oriented queues
● Queue may have more tasks
● Tasks can be preempted
■ Poll for I/O
● io_uring_enter or equivalent
■ Make scheduling decision
● Pick next queue
● Scheduling goal is to keep q_runtime / q_shares equal across queues
● Selection of queue is not round-robin
13. Preemption techniques
■ Read clock and compare to timeslice end deadline
● Prohibitively expensive
■ Use timer+signal
● Works, icky locking
■ Use kernel timer to write to user memory location
● linux-aio or io_uring
● Tricky but very efficient
14. Stall detector
■ Signal-based mechanism to detect where you “forgot” to add a
preemption check
■ cf. Accidentally Quadratic
18. Resource partitioning (QoS)
• Provide different quality of service to different users
Memtable
Seastar
Scheduler
Compaction
Query 1
Repair
Commitlog
SSD
Compaction
Backlog
Monitor
Memory
Monitor
Adjust priority
Adjust priority
WAN
CPU
Query 2
19. I/O scheduling
■ Logically, same
■ But scheduling an entity much more complicated than a CPU core
■ More difficult cross-core coordination
■ More in Pavel’s talk
● “What We Need to Unlearn about Persistent Storage”