SlideShare une entreprise Scribd logo
1  sur  30
KIWI.COM TAKES FLIGHT
WITH SCYLLA
Jan Plhak
Principal engineer at Kiwi.com
Presenter bio
Mathematician who turned to the Dark Side.
Working at travel industry for 5 years now.
Currently principal engineer at Kiwi.com - big data,
distributed systems, fancy algorithmics, C++ devel...
INTRODUCTION
What is Kiwi.com
▪ “Provides a fare aggregator, metasearch engine and
booking for airline tickets.”
▪ Basically helps you figure out where you can fly within
your budget.
▪ Virtual interlining
What is Kiwi.com
What is Kiwi.com
▪ So we store some flights data…
▪ ±100 000 flights/day -> ±36M flights/year
▪ That’s a lot of data right?
Even your phone can store that...
So we store some flights data
▪ Combinations...
▪ ±7G (billions) flight entries
▪ 350 000 writes/sec, 600 000 reads/sec
▪ 20TB in multiple replicas
Your phone can’t store that...
How we store the data
Rocky road to perfection...
Stage one
One Database
Stage two
Custom sharding, 60 databases + 60 x Redis, absolute joy
Stage three
▪ End of dark ages
▪ Distributed, scalable
▪ Data replication
▪ Much more performance
Stage Scylla
▪ Currently migrating
▪ Allows us to scale even further
▪ Allows us to ditch many workarounds we had to
implement because of Cassandra
▪ More in Martin’s talk
Scylla migration - fun fact
▪ Our use case is very read-intensive (600 000 reads/sec)
▪ A many of these reads can be cached
▪ Cassandra uses system cache - very slow
▪ Our current solution:
Scylla migration - fun fact
▪ Scylla vs Cassandra benchmarking
▪ Same data, same cluster, same read structure
▪ Scylla - 900K reads/s vs Cassandra - 40k reads/s
MASSIVE FULL TABLE SCANS
Motivation
▪ Precomputation engine needs flights data
▪ Downloading all the data every hour
▪ + Secondary production, testing…
▪ = A lot of stress on production database
Motivation
▪ Stages 1 and 2 - direct downloading - Worked well
▪ Stage 3 - Cassandra + much more data
• Token ranges
• CPU overload
• Massive latency spikes over the whole system
Why it failed
▪ Not very efficient implementation... Java...
▪ Re-reading all the data - very inefficient
▪ Idea - add “last_update_timestamp” column
• Select only recently updated entries
• Didn’t work - Cassandra still has to go through all the data
If only we could efficiently read only the
recently updated data...
Opening Pandora's box
▪ Cassandra flushes new data from memory to disk,
MemTable -> SSTable
▪ Every node holds multiple SSTables for each column family
▪ SSTables are immutable
And so we got an idea...
Opening Pandora's box
▪ Create a service that can detect and parse all newly created
SSTables - Splitters
▪ Stream the data to our distributed custom cache storage -
Mergers
▪ Feed our preprocessing engine with data from Mergers
▪ If Splitters are efficient, we can read the flights data with zero
impact on Cassandra’s performance
Masterplan diagram
Splitters
▪ Step 1 - Reverse-engineer SSTable format from Cassandra src
▪ Step 2 - Implement fast SSTable parser in C++
▪ Step 3 - Implement mechanism for new SSTable detection
▪ Step 4 - Stream all the data to Mergers - including the
“last_update_timestamp”
▪ Step 5 - deploy the Splitter on every Cassandra node
Mergers
▪ Distributed storage, accepting data from Splitters
▪ Sharding based on logical key in our data - useful for
precomputation and streaming to our Engine
▪ Replication factor of 1 - If any node fails, remaining nodes have
to take it’s shards - restream everything!
Problems
▪ MemTable -> SSTable latency (±undefined)…
▪ … and eventual consistency - Splitters on all replicas ...
▪ … some data could be missing
▪ Cassandra’s vs our sharding - Merger failure -> complete reload
▪ Depending on internal format - zero support, no guarantees,
problematic documentation, insane
▪ Additional development, it took some time to get right
The good things
▪ Allows us to do frequent full-data dumps
▪ Performance
• Our C++ parser is very fast
• During normal operation - near-zero load on DB servers
▪ Zero impact on production DB - complete isolation
▪ Mergers - custom built for our use case - very efficient
WHAT SCYLLA CHANGED
Scylla is better
▪ Currently migrating, some problems (Scylla is too good)
▪ Testing -> continuous full table scans - filter for
“last_update_timestamp”
▪ Using token ranges - Scylla can handle, no overloading
What’s next?
▪ SSTable parser removal - Amazing!!!
▪ Two possible scenarios
a. Keep splitters and read preferably local token ranges (Complex)
b. Keep only Mergers and read the data directly (Much easier)
Problems
▪ MemTable -> SSTable latency (±undefined)…
▪ … and eventual consistency - Splitters on all replicas ...
▪ … some data could be missing
▪ Cassandra’s vs our sharding - Merger failure -> complete reload
▪ Depending on internal format - zero support, no guarantees,
problematic documentation, insane
▪ Additional development, it took some time to get right
Thank You
Any Questions ?
Please stay in touch
jan.plhak@kiwi.com

Contenu connexe

Plus de ScyllaDB

Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 
Top NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling MistakesTop NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling MistakesScyllaDB
 
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesNoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesScyllaDB
 
Optimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database DriversOptimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database DriversScyllaDB
 
Overcoming Media Streaming Challenges with NoSQL
Overcoming Media Streaming Challenges with NoSQLOvercoming Media Streaming Challenges with NoSQL
Overcoming Media Streaming Challenges with NoSQLScyllaDB
 
How Optimizely (Safely) Maximizes Database Concurrency.pdf
How Optimizely (Safely) Maximizes Database Concurrency.pdfHow Optimizely (Safely) Maximizes Database Concurrency.pdf
How Optimizely (Safely) Maximizes Database Concurrency.pdfScyllaDB
 
How Development Teams Cut Costs with ScyllaDB.pdf
How Development Teams Cut Costs with ScyllaDB.pdfHow Development Teams Cut Costs with ScyllaDB.pdf
How Development Teams Cut Costs with ScyllaDB.pdfScyllaDB
 
Learning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
Learning Rust the Hard Way for a Production Kafka + ScyllaDB PipelineLearning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
Learning Rust the Hard Way for a Production Kafka + ScyllaDB PipelineScyllaDB
 
NoSQL at Scale: Proven Practices & Pitfalls
NoSQL at Scale: Proven Practices & PitfallsNoSQL at Scale: Proven Practices & Pitfalls
NoSQL at Scale: Proven Practices & PitfallsScyllaDB
 

Plus de ScyllaDB (20)

Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 
Top NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling MistakesTop NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling Mistakes
 
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesNoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
 
Optimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database DriversOptimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database Drivers
 
Overcoming Media Streaming Challenges with NoSQL
Overcoming Media Streaming Challenges with NoSQLOvercoming Media Streaming Challenges with NoSQL
Overcoming Media Streaming Challenges with NoSQL
 
How Optimizely (Safely) Maximizes Database Concurrency.pdf
How Optimizely (Safely) Maximizes Database Concurrency.pdfHow Optimizely (Safely) Maximizes Database Concurrency.pdf
How Optimizely (Safely) Maximizes Database Concurrency.pdf
 
How Development Teams Cut Costs with ScyllaDB.pdf
How Development Teams Cut Costs with ScyllaDB.pdfHow Development Teams Cut Costs with ScyllaDB.pdf
How Development Teams Cut Costs with ScyllaDB.pdf
 
Learning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
Learning Rust the Hard Way for a Production Kafka + ScyllaDB PipelineLearning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
Learning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
 
NoSQL at Scale: Proven Practices & Pitfalls
NoSQL at Scale: Proven Practices & PitfallsNoSQL at Scale: Proven Practices & Pitfalls
NoSQL at Scale: Proven Practices & Pitfalls
 

Dernier

Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 

Dernier (20)

Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 

Scylla Summit 2018: Kiwi.com Takes Flight with Scylla

  • 1. KIWI.COM TAKES FLIGHT WITH SCYLLA Jan Plhak Principal engineer at Kiwi.com
  • 2. Presenter bio Mathematician who turned to the Dark Side. Working at travel industry for 5 years now. Currently principal engineer at Kiwi.com - big data, distributed systems, fancy algorithmics, C++ devel...
  • 4. What is Kiwi.com ▪ “Provides a fare aggregator, metasearch engine and booking for airline tickets.” ▪ Basically helps you figure out where you can fly within your budget. ▪ Virtual interlining
  • 6. What is Kiwi.com ▪ So we store some flights data… ▪ ±100 000 flights/day -> ±36M flights/year ▪ That’s a lot of data right? Even your phone can store that...
  • 7. So we store some flights data ▪ Combinations... ▪ ±7G (billions) flight entries ▪ 350 000 writes/sec, 600 000 reads/sec ▪ 20TB in multiple replicas Your phone can’t store that...
  • 8. How we store the data Rocky road to perfection...
  • 10. Stage two Custom sharding, 60 databases + 60 x Redis, absolute joy
  • 11. Stage three ▪ End of dark ages ▪ Distributed, scalable ▪ Data replication ▪ Much more performance
  • 12. Stage Scylla ▪ Currently migrating ▪ Allows us to scale even further ▪ Allows us to ditch many workarounds we had to implement because of Cassandra ▪ More in Martin’s talk
  • 13. Scylla migration - fun fact ▪ Our use case is very read-intensive (600 000 reads/sec) ▪ A many of these reads can be cached ▪ Cassandra uses system cache - very slow ▪ Our current solution:
  • 14. Scylla migration - fun fact ▪ Scylla vs Cassandra benchmarking ▪ Same data, same cluster, same read structure ▪ Scylla - 900K reads/s vs Cassandra - 40k reads/s
  • 16. Motivation ▪ Precomputation engine needs flights data ▪ Downloading all the data every hour ▪ + Secondary production, testing… ▪ = A lot of stress on production database
  • 17. Motivation ▪ Stages 1 and 2 - direct downloading - Worked well ▪ Stage 3 - Cassandra + much more data • Token ranges • CPU overload • Massive latency spikes over the whole system
  • 18. Why it failed ▪ Not very efficient implementation... Java... ▪ Re-reading all the data - very inefficient ▪ Idea - add “last_update_timestamp” column • Select only recently updated entries • Didn’t work - Cassandra still has to go through all the data If only we could efficiently read only the recently updated data...
  • 19. Opening Pandora's box ▪ Cassandra flushes new data from memory to disk, MemTable -> SSTable ▪ Every node holds multiple SSTables for each column family ▪ SSTables are immutable And so we got an idea...
  • 20. Opening Pandora's box ▪ Create a service that can detect and parse all newly created SSTables - Splitters ▪ Stream the data to our distributed custom cache storage - Mergers ▪ Feed our preprocessing engine with data from Mergers ▪ If Splitters are efficient, we can read the flights data with zero impact on Cassandra’s performance
  • 22. Splitters ▪ Step 1 - Reverse-engineer SSTable format from Cassandra src ▪ Step 2 - Implement fast SSTable parser in C++ ▪ Step 3 - Implement mechanism for new SSTable detection ▪ Step 4 - Stream all the data to Mergers - including the “last_update_timestamp” ▪ Step 5 - deploy the Splitter on every Cassandra node
  • 23. Mergers ▪ Distributed storage, accepting data from Splitters ▪ Sharding based on logical key in our data - useful for precomputation and streaming to our Engine ▪ Replication factor of 1 - If any node fails, remaining nodes have to take it’s shards - restream everything!
  • 24. Problems ▪ MemTable -> SSTable latency (±undefined)… ▪ … and eventual consistency - Splitters on all replicas ... ▪ … some data could be missing ▪ Cassandra’s vs our sharding - Merger failure -> complete reload ▪ Depending on internal format - zero support, no guarantees, problematic documentation, insane ▪ Additional development, it took some time to get right
  • 25. The good things ▪ Allows us to do frequent full-data dumps ▪ Performance • Our C++ parser is very fast • During normal operation - near-zero load on DB servers ▪ Zero impact on production DB - complete isolation ▪ Mergers - custom built for our use case - very efficient
  • 27. Scylla is better ▪ Currently migrating, some problems (Scylla is too good) ▪ Testing -> continuous full table scans - filter for “last_update_timestamp” ▪ Using token ranges - Scylla can handle, no overloading
  • 28. What’s next? ▪ SSTable parser removal - Amazing!!! ▪ Two possible scenarios a. Keep splitters and read preferably local token ranges (Complex) b. Keep only Mergers and read the data directly (Much easier)
  • 29. Problems ▪ MemTable -> SSTable latency (±undefined)… ▪ … and eventual consistency - Splitters on all replicas ... ▪ … some data could be missing ▪ Cassandra’s vs our sharding - Merger failure -> complete reload ▪ Depending on internal format - zero support, no guarantees, problematic documentation, insane ▪ Additional development, it took some time to get right
  • 30. Thank You Any Questions ? Please stay in touch jan.plhak@kiwi.com

Notes de l'éditeur

  1. Someone had very good idea - we will do custom sharding! But then, the Postgres started to fail due to high read count, so guess what, people had another great idea! We will use redis! Great thing to maintain.
  2. Who thinks Redis
  3. Scylla is on the left. We will be able to remove the wall of redises.
  4. One of the workarounds will be the main topic of this presentation
  5. Our engine is always hungry
  6. So have you heard of SSTables?
  7. Have you heard of Java? We will get to last_update_timestamp later
  8. Why is it ok to have replication factor 1? Mention last_update_timestamp
  9. The main good thing is it actually WORKS! In production for more than a year. Even if things get out of hand, we only overload Mergers