SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Dor Laor, ScyllaDB Co-Founder & CEO
Avi Kivity, ScyllaDB Co-Founder & CTO
Introducing Scylla
Open Source 4.0
Dor Laor
Dor Laor is the CEO of ScyllaDB. Previously, Dor was part of the founding team of the KVM
hypervisor under Qumranet that was acquired by Red Hat. At Red Hat Dor was managing the KVM
and Xen development for several years. Dor holds an MSc from the Technion and a Phd in
snowboarding.
Avi Kivity
Avi Kivity, CTO of ScyllaDB, is known mostly for starting the Kernel-based Virtual Machine (KVM)
project, the hypervisor underlying many production clouds. He has worked for Qumranet and Red
Hat as KVM maintainer until December 2012. Avi is now CTO of ScyllaDB, a company that seeks to
bring the same kind of innovation to the public cloud space.
Presenters
Agenda
■ Alternator: DynamoDB-Compatible API
■ Lightweight Transactions
■ CDC
■ Repair Improvements
■ Scylla Operator for Kubernetes
■ Comparing Scylla 4.0 to Cassandra 4.0
Scylla Alternator: DynamoDB-Compatible API
■ Part of Scylla core, not a layer
■ Fits well to the Scylla model: Yet ‘another’
columnar database
■ Drop-in replacement for DynamoDB
■ Open source!
■ Consumable through
{git, gcc, docker, k8s, rpm, DBaaS}
Scylla Alternator: Why Use?
■ Substantially better price/performance
■ Better tail latency
■ Easy development – docker & laptop
■ It’s OPEN
■ Multi-cloud, hybrid-cloud, your own cloud
■ Observability: Prometheus and Grafana
■ No limits on object sizes, partition sizes, etc.
■ No throttled requests
■ Workload consolidation
■ Workload prioritization (coming)
Scylla Alternator: When to Use?
■ Docker image – Easy & cheap development
■ Leave one datacenter on DynamoDB, others on ScyllaDB
■ Scylla Cloud – Easy managed database
■ Ultimately – Always!
Protocol Comparison: DynamoDB API vs CQL
■ We recently covered it in this blog post
■ HTTP native vs CQL
■ HTTP lack of request multiplexing
■ Textual vs. binary protocol
■ CQL prepared statement
■ CQL topology awareness
■ CRDT vs RMW, LWT
■ Schema-full vs. schema-less
■ Keyspaces vs. single table design
Lightweight Transactions – Linearizability
■ Long anticipated, finally, feature parity ;)
■ Based on Paxos implementation, like Cassandra
■ 3 round trips vs Cassandra’s 4
■ More improvements
CQL Conditional Statement
> UPDATE employees SET join_date = '2018-05-19'
WHERE firstname = 'John' AND lastname = 'Doe'
IF join_date != null;
[applied]
-----------
False
9
Conditional Batches
BEGIN BATCH
UPDATE tasks SET n_abandoned = 0 WHERE project_id = 1
IF n_abandoned > 0
DELETE FROM tasks WHERE project_id = 1
AND state = 'Abandoned'
APPLY BATCH;
10
Lightweight Transactions
■ 3 round trips vs Cassandra’s 4
■ Shard-aware driver
■ Safe/durable commitlog by default
with the best performance
■ Good observability
Change Data Capture – CDC
Consumable modification record for one
or more tables in the database
■ Capture changes (write/update/delete)
■ Asynchronously readable by a consumer
■ Using CQL statements, protocol, and driver
■ Table level Granularity
■ Highly Available
■ Persistent
■ (Eventually) Consistent
Feeding Microservices
Kafka
CDC
Stream
Fraud
Detection
Data Lake
Real Time
Analysis
Search
CREATE TABLE company.employees (
department text,
last_name text,
first_name text,
age int,
level int,
PRIMARY KEY (department,
last_name, first_name)
) WITH cdc = {/* CDC parameters go here
*/};
CREATE TABLE company.employees_scylla_cdc_log (
cdc$stream_id blob,
cdc$time timeuuid,
cdc$batch_seq_no int,
cdc$operation tinyint,
cdc$ttl bigint,
department text,
first_name text,
last_name text,
age int, cdc$deleted_age boolean,
level int, cdc$deleted_level boolean,
PRIMARY KEY (cdc$stream_id, cdc$time,
cdc$batch_seq_no)
)
CDC Log
CDC Write
RF=3
CL = QUORUM (2)
Comparison Chart
Cassandra DynamoDB MongoDB Scylla
Consumer location on-node off-node off-node off-node
Replication duplicated deduplicated deduplicated deduplicated
Deltas yes no partial yes
Pre-image no yes no optional
Post-image no yes yes optional
Slow consumer reaction Table stopped Consumer loses data Consumer loses data Consumer loses data
Ordering no yes yes yes
Filtering no no ? yes
CDC under the Hood
■ CDC tables use TWCS
● TWCS major compaction (unrelated to CDC)
■ CQL Bypass cache – per query bypass hint for standard CQL
■ New: CDC table shouldn’t be cached – Per table settings
Repair Improvements
■ Row level repair
● Minimizes data transfer
● Minimizes disk reads
Repair Improvements
■ Row level repair
■ Repair Based Node Operations
● Bootstrap, decommission, replace, removenode, rebuild
● Node replace is _safe_
● Resumable in nature
● Efficient – rebuild will only bring the missing data
Repair Improvements
■ Row level repair
■ Repair Based Node Operations
■ Tune scheduling priority to accelerate bootstrap – Coming
■ Adjust way of adding repair sstables to main dataset – Coming
■ Network scheduling – Future
■ IBF – Invertible Bloom Filter – Future
Kubernetes Operator
Easy to get started:
Kubernetes Operator 1/5
Kubernetes Operator 2/5
Kubernetes Operator 3/5
Kubernetes Operator 4/5
Kubernetes Operator 5/5
Cassandra 4.0 Features
■ Java 11, ZGC
■ Netty
■ Zero copy networking
■ Virtual Tables
■ Transient Replication (experimental)
■ Incremental Repair
■ Audit Logging
■ Pluggable Storage Engine
The Benchmark
■ Max IOPS while latency < 10ms P99
■ 40 Cassandra i3.4xl
■ 4 Scylla i3.metal
■ Scylla delivers 600k ops at 12ms P99
■ Cassandra delivers 200k ops at 85ms P99
■ Scylla performs
● 3x throughput
● 7x better latency
● 2.5x cheaper deployment
● 10x fewer nodes
● 14x faster elastic scale out -> More cost
● 30x faster elastic scale in
I3.metal Cluster Guided Tour
Resources
■ Download Scylla Open Source
■ Scylla University
■ Test Drive
■ On-Demand Webinars
● Lightweight Transactions
● Change Data Capture (CDC)
● Scylla’s DynamoDB Compatible API
We’ll soon email you these links and the recording
Q&A
Thank you

Contenu connexe

Tendances

Tendances (20)

FireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseFireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
 
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
 
Scylla Summit 2018: Getting the Most Out of Scylla on Kubernetes
Scylla Summit 2018: Getting the Most Out of Scylla on KubernetesScylla Summit 2018: Getting the Most Out of Scylla on Kubernetes
Scylla Summit 2018: Getting the Most Out of Scylla on Kubernetes
 
How to achieve no compromise performance and availability
How to achieve no compromise performance and availabilityHow to achieve no compromise performance and availability
How to achieve no compromise performance and availability
 
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating  Volatile Latencies Inside Rakuten’s NoSQL MigrationEliminating  Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
 
Seastar Summit 2019 vectorized.io
Seastar Summit 2019   vectorized.ioSeastar Summit 2019   vectorized.io
Seastar Summit 2019 vectorized.io
 
Scylla Virtual Workshop 2020
Scylla Virtual Workshop 2020Scylla Virtual Workshop 2020
Scylla Virtual Workshop 2020
 
Scylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDB
 
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
 
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
 
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech CycleScylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
 
Scylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of ScyllaScylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of Scylla
 
Critical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency DatabaseCritical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency Database
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
The Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesThe Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking Databases
 
Scylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent Databases
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
 
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
 
Running Scylla on Kubernetes with Scylla Operator
Running Scylla on Kubernetes with Scylla OperatorRunning Scylla on Kubernetes with Scylla Operator
Running Scylla on Kubernetes with Scylla Operator
 
Scylla Summit 2018: Scylla 3.0 and Beyond
Scylla Summit 2018: Scylla 3.0 and BeyondScylla Summit 2018: Scylla 3.0 and Beyond
Scylla Summit 2018: Scylla 3.0 and Beyond
 

Similaire à Introducing Scylla Open Source 4.0

001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
Scott Miao
 
Performance Tuning a Cloud Application: A Real World Case Study
Performance Tuning a Cloud Application: A Real World Case StudyPerformance Tuning a Cloud Application: A Real World Case Study
Performance Tuning a Cloud Application: A Real World Case Study
shane_gibson
 

Similaire à Introducing Scylla Open Source 4.0 (20)

Free & Open DynamoDB API for Everyone
Free & Open DynamoDB API for EveryoneFree & Open DynamoDB API for Everyone
Free & Open DynamoDB API for Everyone
 
How Development Teams Cut Costs with ScyllaDB.pdf
How Development Teams Cut Costs with ScyllaDB.pdfHow Development Teams Cut Costs with ScyllaDB.pdf
How Development Teams Cut Costs with ScyllaDB.pdf
 
Scylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla OperatorScylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla Operator
 
To Serverless and Beyond
To Serverless and BeyondTo Serverless and Beyond
To Serverless and Beyond
 
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
 
19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world
19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world
19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world
 
Avishay Traeger & Shimshon Zimmerman, Stratoscale - Deploying OpenStack Cinde...
Avishay Traeger & Shimshon Zimmerman, Stratoscale - Deploying OpenStack Cinde...Avishay Traeger & Shimshon Zimmerman, Stratoscale - Deploying OpenStack Cinde...
Avishay Traeger & Shimshon Zimmerman, Stratoscale - Deploying OpenStack Cinde...
 
Survey of High Performance NoSQL Systems
Survey of High Performance NoSQL SystemsSurvey of High Performance NoSQL Systems
Survey of High Performance NoSQL Systems
 
Retour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantRetour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenant
 
TUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterTUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data Center
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
Building Next Generation Drivers: Optimizing Performance in Go and Rust
Building Next Generation Drivers: Optimizing Performance in Go and RustBuilding Next Generation Drivers: Optimizing Performance in Go and Rust
Building Next Generation Drivers: Optimizing Performance in Go and Rust
 
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
 
Performance Tuning a Cloud Application: A Real World Case Study
Performance Tuning a Cloud Application: A Real World Case StudyPerformance Tuning a Cloud Application: A Real World Case Study
Performance Tuning a Cloud Application: A Real World Case Study
 
Replicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon RedshiftReplicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon Redshift
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
DevEx | there’s no place like k3s
DevEx | there’s no place like k3sDevEx | there’s no place like k3s
DevEx | there’s no place like k3s
 
Ceph as storage for CloudStack
Ceph as storage for CloudStack Ceph as storage for CloudStack
Ceph as storage for CloudStack
 
Red Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShiftRed Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShift
 

Plus de ScyllaDB

Plus de ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Introducing Scylla Open Source 4.0

  • 1. Dor Laor, ScyllaDB Co-Founder & CEO Avi Kivity, ScyllaDB Co-Founder & CTO Introducing Scylla Open Source 4.0
  • 2. Dor Laor Dor Laor is the CEO of ScyllaDB. Previously, Dor was part of the founding team of the KVM hypervisor under Qumranet that was acquired by Red Hat. At Red Hat Dor was managing the KVM and Xen development for several years. Dor holds an MSc from the Technion and a Phd in snowboarding. Avi Kivity Avi Kivity, CTO of ScyllaDB, is known mostly for starting the Kernel-based Virtual Machine (KVM) project, the hypervisor underlying many production clouds. He has worked for Qumranet and Red Hat as KVM maintainer until December 2012. Avi is now CTO of ScyllaDB, a company that seeks to bring the same kind of innovation to the public cloud space. Presenters
  • 3. Agenda ■ Alternator: DynamoDB-Compatible API ■ Lightweight Transactions ■ CDC ■ Repair Improvements ■ Scylla Operator for Kubernetes ■ Comparing Scylla 4.0 to Cassandra 4.0
  • 4. Scylla Alternator: DynamoDB-Compatible API ■ Part of Scylla core, not a layer ■ Fits well to the Scylla model: Yet ‘another’ columnar database ■ Drop-in replacement for DynamoDB ■ Open source! ■ Consumable through {git, gcc, docker, k8s, rpm, DBaaS}
  • 5. Scylla Alternator: Why Use? ■ Substantially better price/performance ■ Better tail latency ■ Easy development – docker & laptop ■ It’s OPEN ■ Multi-cloud, hybrid-cloud, your own cloud ■ Observability: Prometheus and Grafana ■ No limits on object sizes, partition sizes, etc. ■ No throttled requests ■ Workload consolidation ■ Workload prioritization (coming)
  • 6. Scylla Alternator: When to Use? ■ Docker image – Easy & cheap development ■ Leave one datacenter on DynamoDB, others on ScyllaDB ■ Scylla Cloud – Easy managed database ■ Ultimately – Always!
  • 7. Protocol Comparison: DynamoDB API vs CQL ■ We recently covered it in this blog post ■ HTTP native vs CQL ■ HTTP lack of request multiplexing ■ Textual vs. binary protocol ■ CQL prepared statement ■ CQL topology awareness ■ CRDT vs RMW, LWT ■ Schema-full vs. schema-less ■ Keyspaces vs. single table design
  • 8. Lightweight Transactions – Linearizability ■ Long anticipated, finally, feature parity ;) ■ Based on Paxos implementation, like Cassandra ■ 3 round trips vs Cassandra’s 4 ■ More improvements
  • 9. CQL Conditional Statement > UPDATE employees SET join_date = '2018-05-19' WHERE firstname = 'John' AND lastname = 'Doe' IF join_date != null; [applied] ----------- False 9
  • 10. Conditional Batches BEGIN BATCH UPDATE tasks SET n_abandoned = 0 WHERE project_id = 1 IF n_abandoned > 0 DELETE FROM tasks WHERE project_id = 1 AND state = 'Abandoned' APPLY BATCH; 10
  • 11. Lightweight Transactions ■ 3 round trips vs Cassandra’s 4 ■ Shard-aware driver ■ Safe/durable commitlog by default with the best performance ■ Good observability
  • 12. Change Data Capture – CDC Consumable modification record for one or more tables in the database ■ Capture changes (write/update/delete) ■ Asynchronously readable by a consumer ■ Using CQL statements, protocol, and driver ■ Table level Granularity ■ Highly Available ■ Persistent ■ (Eventually) Consistent
  • 14. CREATE TABLE company.employees ( department text, last_name text, first_name text, age int, level int, PRIMARY KEY (department, last_name, first_name) ) WITH cdc = {/* CDC parameters go here */}; CREATE TABLE company.employees_scylla_cdc_log ( cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int, cdc$operation tinyint, cdc$ttl bigint, department text, first_name text, last_name text, age int, cdc$deleted_age boolean, level int, cdc$deleted_level boolean, PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no) ) CDC Log
  • 15. CDC Write RF=3 CL = QUORUM (2)
  • 16. Comparison Chart Cassandra DynamoDB MongoDB Scylla Consumer location on-node off-node off-node off-node Replication duplicated deduplicated deduplicated deduplicated Deltas yes no partial yes Pre-image no yes no optional Post-image no yes yes optional Slow consumer reaction Table stopped Consumer loses data Consumer loses data Consumer loses data Ordering no yes yes yes Filtering no no ? yes
  • 17. CDC under the Hood ■ CDC tables use TWCS ● TWCS major compaction (unrelated to CDC) ■ CQL Bypass cache – per query bypass hint for standard CQL ■ New: CDC table shouldn’t be cached – Per table settings
  • 18. Repair Improvements ■ Row level repair ● Minimizes data transfer ● Minimizes disk reads
  • 19. Repair Improvements ■ Row level repair ■ Repair Based Node Operations ● Bootstrap, decommission, replace, removenode, rebuild ● Node replace is _safe_ ● Resumable in nature ● Efficient – rebuild will only bring the missing data
  • 20. Repair Improvements ■ Row level repair ■ Repair Based Node Operations ■ Tune scheduling priority to accelerate bootstrap – Coming ■ Adjust way of adding repair sstables to main dataset – Coming ■ Network scheduling – Future ■ IBF – Invertible Bloom Filter – Future
  • 27. Cassandra 4.0 Features ■ Java 11, ZGC ■ Netty ■ Zero copy networking ■ Virtual Tables ■ Transient Replication (experimental) ■ Incremental Repair ■ Audit Logging ■ Pluggable Storage Engine
  • 28. The Benchmark ■ Max IOPS while latency < 10ms P99 ■ 40 Cassandra i3.4xl ■ 4 Scylla i3.metal ■ Scylla delivers 600k ops at 12ms P99 ■ Cassandra delivers 200k ops at 85ms P99 ■ Scylla performs ● 3x throughput ● 7x better latency ● 2.5x cheaper deployment ● 10x fewer nodes ● 14x faster elastic scale out -> More cost ● 30x faster elastic scale in
  • 30. Resources ■ Download Scylla Open Source ■ Scylla University ■ Test Drive ■ On-Demand Webinars ● Lightweight Transactions ● Change Data Capture (CDC) ● Scylla’s DynamoDB Compatible API We’ll soon email you these links and the recording
  • 31. Q&A