Optimizing shard-aware drivers for ScyllaDB has taken multiple initiatives, often requiring a complete rewrite from scratch. Learn the work undertaken to improve performance of ScyllaDB drivers for both Go and Rust, plus how the Rust code base will be used as a core for drivers with other language bindings going forward. The session highlights performance increases obtained using techniques available in the respective programming languages, including shaving performance off Google's B-tree implementation with Go generics, and using the asynchronous Tokio framework as the basis of a new Rust driver.
2. Piotr Grabowski
■ Software Team Leader at ScyllaDB responsible for all
ScyllaDB drivers, ScyllaDB Kafka Connectors (ScyllaDB
Sink Connector and ScyllaDB CDC Source Connector)
■ Joined ScyllaDB 2.5 year ago
5. Drivers 101
■ Drivers (in this presentation) - libraries that allow sending queries to ScyllaDB
■ Primary protocol: CQL (Cassandra Query Language) protocol
■ TCP
■ ScyllaDB supports CQL v4
■ Frame-based protocol, supporting multiple streams
■ Supports LZ4 and Snappy compression
■ ScyllaDB drivers support shard awareness:
■ Driver can connect to a specific shard of ScyllaDB
6. Drivers 101 - Role of Drivers
■ The role of drivers:
■ Serialization/deserialization of CQL frames
■ Serialization/deserialization of ScyllaDB types
■ Querying and maintaining metadata about tables/nodes
■ Routing requests to correct nodes (and shards)
■ Sending request across network
■ Conveniently constructing and executing queries in your language of choice:
■ gocqlx
■ Java Driver’s Mapper interface
7. Drivers 101 - Performance
■ How can the driver improve performance?
■ Shard awareness: sending the query to a correct shard
■ Partitioners: ScyllaDB’s CDC (Change Data Capture) implements a custom
partitioner which determines a node to send the query to
■ LWT Optimization: consistently prefer a single replica when executing a LWT
query to avoid Paxos conflicts
■ Optimizing hot paths in the driver:
■ Serialization/deserialization
■ Routing code
■ Avoiding copies, allocations and locks
9. ■ A new ScyllaDB driver for the Go language
■ Developed by a University of Warsaw student team for their Bachelor’s
thesis
■ Focus of the project: performance
■ Avoiding mutexes
■ Avoiding allocations - reducing GC impact
■ Faster B-trees with generics
ScyllaDB Go Driver
11. ■ Approach inspired by ScyllaDB’s shared-nothing architecture
■ Techniques:
■ Better overall picture of the architecture
■ Clear, efficient communication between components
■ Using idiomatic Go channels and atomics
■ Result:
■ The driver only uses a single mutex,
local to a TCP connection
■ Reducing race conditions
ScyllaDB Go Driver - Mutexes
12. ■ Initial benchmarks of gocql have shown that the performance was heavily
influenced by the Go’s Garbage Collector
■ ScyllaDB Go Driver performs 5x less allocations compared to gocql
■ The result of:
■ Meticulously designing each component of the code
■ Faster B-Trees with Generics
■ Reusing buffers owned by writer/reader
ScyllaDB Go Driver - Memory
13. ■ Go 1.18 introduced support for generics
■ Using generics in B-tree implementation yielded 40% performance
improvement compared to Google’s btree implementation
■ The implementation performs zero allocations in all operations (except
cloning)
■ By replacing interface usage with generics, this approach
avoids variables escaping to the heap
■ Google’s btree maintainers implemented BTreeG
Faster B-Trees with Generics
14. ■ During profiling of different drivers, we noticed a difference in number of
syscalls different drivers perform
■ Difference in network syscalls:
■ Observed using syscount and socksize tools
■ Fewer sendmsg (or similar) syscalls…
■ …but with larger data buffers
■ Instead of sending each CQL frame separately,
better to send multiple ones at a time
ScyllaDB Go Driver - Request Coalescing
18. ■ The idea was born during a hackathon in 2020
■ Over the last 3 years we continued the development
■ Uses Tokio framework
■ The driver is now feature complete, supporting many advanced features:
■ Shard awareness
■ Asynchronous interface with support for large concurrency
■ Compression
■ All CQL types
■ Speculative execution
■ TLS support
ScyllaDB Rust Driver
19. ■ Issue raised by the author of latte - a benchmark tool for ScyllaDB
and Cassandra
■ The driver had problems scaling with high concurrency of requests
■ We managed to identify a root cause in the implementation of
FuturesUnordered, a utility to gather many futures and wait for them
■ Due to cooperative scheduling in Tokio, it was possible for
FuturesUnordered to iterate over all futures each time
it was polled
■ A fix was merged to Tokio to limit the number of
Futures iterated over in each poll
ScyllaDB Rust Driver - O(N²) in Tokio?
20. ■ Rack-aware load balancing
■ Reduce the cost of querying ScyllaDB nodes in other racks (corresponding for example to
AWS Availability Zones)
■ Reduce the latency by querying the nearest rack
■ Iterator-based deserialization
■ The current implementation deserializes row data
into equivalent of Vec<Vec<Option<CqlValue>>
■ Skip materializing all rows into vector,
deserialize on-the-fly
■ Load balancing refactor
■ Main goal: reduce number of allocations and atomic
operations while building the query plan,
especially on the happy path
ScyllaDB Rust Driver - Ongoing Efforts
22. ■ When benchmarking ScyllaDB Rust Driver against other drivers, we
measured it was the most performant driver, beating the C++ driver
■ Why not develop a way to use ScyllaDB Rust Driver from C++ code?
■ Benefits of a unified core:
■ Higher performance
■ Easier maintenance
■ Fewer bugs
Bindings to ScyllaDB Rust Driver
23. ■ We started development for the C/C++ language
■ C++ bindings to the Rust driver; the same API as the original C++ driver
■ Drop-in replacement (just replacing .so file)
■ The resulting project has an order-of-magnitude fewer LoC
■ Better stability, fewer problems compared to the original C++ driver
Bindings to ScyllaDB Rust Driver - C/C++