SlideShare une entreprise Scribd logo
1  sur  47
InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
scylladb
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
ScyllaDB: Achieving No-Compromise
Performance
Avi Kivity, CTO
@AviKivity
(Hiring!)
Agenda
Background
Goals
Methods
Conclusion
Non-Agenda
● Docker
● Microservices
● Node.js
● Docker
● Orchestration
● JVM GC Tuning
● JSON over HTTP
● Docker
More Non-Agenda
● Cache lines, coherency protocols
● NUMA
● Algorithms are the only thing that matters,
everything else is implementation detail
● Docker
Background - ScyllaDB
● Clustered NoSQL database compatible with
Apache Cassandra
● ~10X performance on same hardware
● Low latency, esp. higher percentiles
● Self tuning
● C++14, fully asynchronous; Seastar!
YCSB Benchmark:
3 node Scylla cluster vs 3, 9, 15, 30
Cassandra machines
3 Scylla
30 Cassandra
3 Cassandra
3 Scylla
30 Cassandra
3 Cassandra
Log-Structured Merge Tree
SStable 1
SStable 2
SStable 3
Time
SStable 4
SStable 5
SStable 1+2+3
Foreground Job Background Job
High-level Goals
● Efficiency:
○ Make the most out of every cycle
● Utilization:
○ Squeeze every cycle from the machine
● Control
○ Spend the cycles on what we want, when we want
Characterizing the problem
● Large numbers of small operations
○ Make coordination cheap
● Lots of communications
○ Within the machine
○ With disk
○ With other machines
Asynchrony,
Everywhere
● Thread-per-core design
○ Never block
● Asynchronous networking
● Asynchronous file I/O
● Asynchronous multicore
General Architecture
Scylla has its own task scheduler
Traditional stack Scylla’s stack
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise is a
pointer to
eventually
computed value
Task is a
pointer to a
lambda function
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread is a
function pointer
Stack is a byte
array from 64k
to megabytes
Context switch cost is
high. Large stacks pollutes
the caches No sharing, millions of
parallel events
The Concurrency
Dilemma
Fundamental performance equation
Concurrency = Throughput * Latency
Fundamental performance equation
Throughput =
Concurrency
Latency
Fundamental performance equation
Latency =
Concurrency
Throughput
Lower bounds for concurrency
● Disks want minimum iodepth for full
throughput (heads/chips)
● Remote nodes need concurrency to hide
network latency and their own min.
concurrency
● Compute wants work for each core
Results of Mathematical Analysis
● Want high concurrency (for throughput)
● Want low concurrency (for latency)
● Resources require concurrency for full
utilization
Sources of concurrency
● Users
○ Reduce concurrency / add nodes
● Internal processes
○ Generate as much concurrency as possible
○ Schedule
Resource Scheduling
Scheduler
Storage
8
User read
User write
Compaction (internal)
Streaming (internal)
30
12
50
50
Why not the Linux I/O scheduler?
● Can only communicate priority by originating
thread
● Will reorder/merge like crazy
● Disable
Figuring out optimal disk
concurrency
Max useful disk
concurrency
Cache design
Cache files or objects?
Using the kernel page cache
● 4k granularity
● Thread-safe
● Synchronous APIs
● General-purpose
● Lack of control (1)
● Lack of control (2)
● Exists
● Hundreds of
hacker-years
● Handling lots of edge
cases
Unified cache
Cassandra Scylla
Key cache
Row cache
On-heap /
Off-heap
Linux page cache
SSTables
Unified cache
SSTables
Workload Conditioning
Workload Conditioning
• Internal feedback loops to balance competing loads
Memtable
Seastar
Scheduler
Compaction
Query
Repair
Commitlog
SSD
Compaction
Backlog
Monitor
Memory
Monitor
Adjust priority
Adjust priority
WAN
CPU
Replacing the system
memory allocator
System memory allocator problems
● Thread safe
● Allocation back pressure
Seastar memory allocator
● Non-Thread safe!
○ Each core gets a private memory pool
● Allocation back pressure
○ Allocator calls a callback when low on memory
○ Scylla evicts cache in response
One allocator
is not enough
Remaining problems with
malloc/free
● Memory gets fragmented over time
○ If workload changes sizes of allocated objects
● Allocating a large contiguous block
requires evicting most of cache
OOM :(Memory
Log-structured memory allocation
● The cache
○ Large majority of memory allocated
○ Small subset of allocation sites
● Teach allocator how to move allocated
objects around
○ Updating references
Log-structured memory allocation
Fancy Animation
Future Improvements
Userspace TCP/IP stack
● Thread-per-core design
● Use DPDK to drive hardware
● Present as experimental mode
○ Needs more testing and productization
Query Compilation to Native Code
● Use LLVM to JIT-compile CQL queries
● Embed database schema and internal
object layouts into the query
● Full control of the software stack can generate big
payoffs
● Careful system design can maximize throughput
● Without sacrificing latency
● Without requiring endless end-user tuning
● While having a lot of fun
Conclusions
● Download: http://www.scylladb.com
● Twitter: @ScyllaDB
● Source: http://github.com/scylladb/scylla
● Mailing lists: scylladb-user @ groups.google.com
● Company site & blog: http://www.scylladb.com
How to interact
THE SCYLLA IS THE LIMIT
Thank you.
Watch the video with slide synchronization on
InfoQ.com!
https://www.infoq.com/presentations/scylladb

Contenu connexe

En vedette

A origem do bem e do mal
A origem do bem e do malA origem do bem e do mal
A origem do bem e do malHelio Cruz
 
Seastar @ SF/BA C++UG
Seastar @ SF/BA C++UGSeastar @ SF/BA C++UG
Seastar @ SF/BA C++UGAvi Kivity
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016Tzach Livyatan
 
Scylla: 1 Million CQL operations per second per server
Scylla: 1 Million CQL operations per second per serverScylla: 1 Million CQL operations per second per server
Scylla: 1 Million CQL operations per second per serverAvi Kivity
 
Scylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScyllaDB
 
Back to the future with C++ and Seastar
Back to the future with C++ and SeastarBack to the future with C++ and Seastar
Back to the future with C++ and SeastarTzach Livyatan
 
OSv – The OS designed for the Cloud
OSv – The OS designed for the CloudOSv – The OS designed for the Cloud
OSv – The OS designed for the CloudYandex
 
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB,  or how we implemented a 10-times faster CassandraSeastar / ScyllaDB,  or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB, or how we implemented a 10-times faster CassandraTzach Livyatan
 
OSv: probably the best OS for cloud workloads you've never hear of
OSv: probably the best OS for cloud workloads you've never hear ofOSv: probably the best OS for cloud workloads you've never hear of
OSv: probably the best OS for cloud workloads you've never hear ofrhatr
 
OSv presentation from Linux Foundation Collaboration Summit
OSv presentation from Linux Foundation Collaboration SummitOSv presentation from Linux Foundation Collaboration Summit
OSv presentation from Linux Foundation Collaboration SummitDon Marti
 
Advanced portfolio progress sheet 11 june
Advanced portfolio progress sheet 11 juneAdvanced portfolio progress sheet 11 june
Advanced portfolio progress sheet 11 junejphibbert
 
Competitor beer brand in thunderbolt
Competitor beer brand in thunderboltCompetitor beer brand in thunderbolt
Competitor beer brand in thunderboltsbest007
 

En vedette (14)

El franquismo
El franquismoEl franquismo
El franquismo
 
A origem do bem e do mal
A origem do bem e do malA origem do bem e do mal
A origem do bem e do mal
 
Seastar @ SF/BA C++UG
Seastar @ SF/BA C++UGSeastar @ SF/BA C++UG
Seastar @ SF/BA C++UG
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016
 
Scylla: 1 Million CQL operations per second per server
Scylla: 1 Million CQL operations per second per serverScylla: 1 Million CQL operations per second per server
Scylla: 1 Million CQL operations per second per server
 
Scylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDS
 
Back to the future with C++ and Seastar
Back to the future with C++ and SeastarBack to the future with C++ and Seastar
Back to the future with C++ and Seastar
 
OSv – The OS designed for the Cloud
OSv – The OS designed for the CloudOSv – The OS designed for the Cloud
OSv – The OS designed for the Cloud
 
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB,  or how we implemented a 10-times faster CassandraSeastar / ScyllaDB,  or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
 
OSv: probably the best OS for cloud workloads you've never hear of
OSv: probably the best OS for cloud workloads you've never hear ofOSv: probably the best OS for cloud workloads you've never hear of
OSv: probably the best OS for cloud workloads you've never hear of
 
OSv presentation from Linux Foundation Collaboration Summit
OSv presentation from Linux Foundation Collaboration SummitOSv presentation from Linux Foundation Collaboration Summit
OSv presentation from Linux Foundation Collaboration Summit
 
Katy
KatyKaty
Katy
 
Advanced portfolio progress sheet 11 june
Advanced portfolio progress sheet 11 juneAdvanced portfolio progress sheet 11 june
Advanced portfolio progress sheet 11 june
 
Competitor beer brand in thunderbolt
Competitor beer brand in thunderboltCompetitor beer brand in thunderbolt
Competitor beer brand in thunderbolt
 

Plus de C4Media

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoC4Media
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileC4Media
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020C4Media
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No KeeperC4Media
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like OwnersC4Media
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaC4Media
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 

Plus de C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Dernier

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 

Dernier (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

ScyllaDB: Achieving No-compromise Performance

  • 1.
  • 2. InfoQ.com: News & Community Site • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ scylladb
  • 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  • 4. ScyllaDB: Achieving No-Compromise Performance Avi Kivity, CTO @AviKivity (Hiring!)
  • 6. Non-Agenda ● Docker ● Microservices ● Node.js ● Docker ● Orchestration ● JVM GC Tuning ● JSON over HTTP ● Docker
  • 7. More Non-Agenda ● Cache lines, coherency protocols ● NUMA ● Algorithms are the only thing that matters, everything else is implementation detail ● Docker
  • 8. Background - ScyllaDB ● Clustered NoSQL database compatible with Apache Cassandra ● ~10X performance on same hardware ● Low latency, esp. higher percentiles ● Self tuning ● C++14, fully asynchronous; Seastar!
  • 9. YCSB Benchmark: 3 node Scylla cluster vs 3, 9, 15, 30 Cassandra machines 3 Scylla 30 Cassandra 3 Cassandra 3 Scylla 30 Cassandra 3 Cassandra
  • 10. Log-Structured Merge Tree SStable 1 SStable 2 SStable 3 Time SStable 4 SStable 5 SStable 1+2+3 Foreground Job Background Job
  • 11. High-level Goals ● Efficiency: ○ Make the most out of every cycle ● Utilization: ○ Squeeze every cycle from the machine ● Control ○ Spend the cycles on what we want, when we want
  • 12. Characterizing the problem ● Large numbers of small operations ○ Make coordination cheap ● Lots of communications ○ Within the machine ○ With disk ○ With other machines
  • 14.
  • 15.
  • 16. ● Thread-per-core design ○ Never block ● Asynchronous networking ● Asynchronous file I/O ● Asynchronous multicore General Architecture
  • 17. Scylla has its own task scheduler Traditional stack Scylla’s stack Promise Task Promise Task Promise Task Promise Task CPU Promise Task Promise Task Promise Task Promise Task CPU Promise Task Promise Task Promise Task Promise Task CPU Promise Task Promise Task Promise Task Promise Task CPU Promise Task Promise Task Promise Task Promise Task CPU Promise is a pointer to eventually computed value Task is a pointer to a lambda function Scheduler CPU Scheduler CPU Scheduler CPU Scheduler CPU Scheduler CPU Thread Stack Thread Stack Thread Stack Thread Stack Thread Stack Thread Stack Thread Stack Thread Stack Thread is a function pointer Stack is a byte array from 64k to megabytes Context switch cost is high. Large stacks pollutes the caches No sharing, millions of parallel events
  • 21. Fundamental performance equation Latency = Concurrency Throughput
  • 22. Lower bounds for concurrency ● Disks want minimum iodepth for full throughput (heads/chips) ● Remote nodes need concurrency to hide network latency and their own min. concurrency ● Compute wants work for each core
  • 23. Results of Mathematical Analysis ● Want high concurrency (for throughput) ● Want low concurrency (for latency) ● Resources require concurrency for full utilization
  • 24. Sources of concurrency ● Users ○ Reduce concurrency / add nodes ● Internal processes ○ Generate as much concurrency as possible ○ Schedule
  • 25. Resource Scheduling Scheduler Storage 8 User read User write Compaction (internal) Streaming (internal) 30 12 50 50
  • 26. Why not the Linux I/O scheduler? ● Can only communicate priority by originating thread ● Will reorder/merge like crazy ● Disable
  • 27. Figuring out optimal disk concurrency Max useful disk concurrency
  • 29. Using the kernel page cache ● 4k granularity ● Thread-safe ● Synchronous APIs ● General-purpose ● Lack of control (1) ● Lack of control (2) ● Exists ● Hundreds of hacker-years ● Handling lots of edge cases
  • 30. Unified cache Cassandra Scylla Key cache Row cache On-heap / Off-heap Linux page cache SSTables Unified cache SSTables
  • 32. Workload Conditioning • Internal feedback loops to balance competing loads Memtable Seastar Scheduler Compaction Query Repair Commitlog SSD Compaction Backlog Monitor Memory Monitor Adjust priority Adjust priority WAN CPU
  • 34. System memory allocator problems ● Thread safe ● Allocation back pressure
  • 35. Seastar memory allocator ● Non-Thread safe! ○ Each core gets a private memory pool ● Allocation back pressure ○ Allocator calls a callback when low on memory ○ Scylla evicts cache in response
  • 37. Remaining problems with malloc/free ● Memory gets fragmented over time ○ If workload changes sizes of allocated objects ● Allocating a large contiguous block requires evicting most of cache
  • 39. Log-structured memory allocation ● The cache ○ Large majority of memory allocated ○ Small subset of allocation sites ● Teach allocator how to move allocated objects around ○ Updating references
  • 42. Userspace TCP/IP stack ● Thread-per-core design ● Use DPDK to drive hardware ● Present as experimental mode ○ Needs more testing and productization
  • 43. Query Compilation to Native Code ● Use LLVM to JIT-compile CQL queries ● Embed database schema and internal object layouts into the query
  • 44. ● Full control of the software stack can generate big payoffs ● Careful system design can maximize throughput ● Without sacrificing latency ● Without requiring endless end-user tuning ● While having a lot of fun Conclusions
  • 45. ● Download: http://www.scylladb.com ● Twitter: @ScyllaDB ● Source: http://github.com/scylladb/scylla ● Mailing lists: scylladb-user @ groups.google.com ● Company site & blog: http://www.scylladb.com How to interact
  • 46. THE SCYLLA IS THE LIMIT Thank you.
  • 47. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/scylladb