Kiwi.com provides a powerful flight, train and bus search engine driven by volatile data — entries expire in just a couple of days. The compute engine loads data every couple of hours from the cluster, running in blue-green deployment and conducting several simultaneous A/B tests. To keep full table scans predictable , Kiwi.com implemented a dedicated cache, to store post-processed results from the database. Where Cassandra’s limitations forced the team to implement a custom scanning service to read newly created SStables and stream updates to the cache, Scylla made it easy and safe to do performant full-table scans. Our Cassandra to Scylla migration, benchmarking on GCP and bare metal OVH, and the benchmarking and performance results with the primary focus on full table scan as the rest of our benchmarking results.
2. Presenter bio
Mathematician who turned to the Dark Side.
Working at travel industry for 5 years now.
Currently principal engineer at Kiwi.com - big data,
distributed systems, fancy algorithmics, C++ devel...
4. What is Kiwi.com
▪ “Provides a fare aggregator, metasearch engine and
booking for airline tickets.”
▪ Basically helps you figure out where you can fly within
your budget.
▪ Virtual interlining
6. What is Kiwi.com
▪ So we store some flights data…
▪ ±100 000 flights/day -> ±36M flights/year
▪ That’s a lot of data right?
Even your phone can store that...
7. So we store some flights data
▪ Combinations...
▪ ±7G (billions) flight entries
▪ 350 000 writes/sec, 600 000 reads/sec
▪ 20TB in multiple replicas
Your phone can’t store that...
8. How we store the data
Rocky road to perfection...
11. Stage three
▪ End of dark ages
▪ Distributed, scalable
▪ Data replication
▪ Much more performance
12. Stage Scylla
▪ Currently migrating
▪ Allows us to scale even further
▪ Allows us to ditch many workarounds we had to
implement because of Cassandra
▪ More in Martin’s talk
13. Scylla migration - fun fact
▪ Our use case is very read-intensive (600 000 reads/sec)
▪ A many of these reads can be cached
▪ Cassandra uses system cache - very slow
▪ Our current solution:
14. Scylla migration - fun fact
▪ Scylla vs Cassandra benchmarking
▪ Same data, same cluster, same read structure
▪ Scylla - 900K reads/s vs Cassandra - 40k reads/s
16. Motivation
▪ Precomputation engine needs flights data
▪ Downloading all the data every hour
▪ + Secondary production, testing…
▪ = A lot of stress on production database
17. Motivation
▪ Stages 1 and 2 - direct downloading - Worked well
▪ Stage 3 - Cassandra + much more data
• Token ranges
• CPU overload
• Massive latency spikes over the whole system
18. Why it failed
▪ Not very efficient implementation... Java...
▪ Re-reading all the data - very inefficient
▪ Idea - add “last_update_timestamp” column
• Select only recently updated entries
• Didn’t work - Cassandra still has to go through all the data
If only we could efficiently read only the
recently updated data...
19. Opening Pandora's box
▪ Cassandra flushes new data from memory to disk,
MemTable -> SSTable
▪ Every node holds multiple SSTables for each column family
▪ SSTables are immutable
And so we got an idea...
20. Opening Pandora's box
▪ Create a service that can detect and parse all newly created
SSTables - Splitters
▪ Stream the data to our distributed custom cache storage -
Mergers
▪ Feed our preprocessing engine with data from Mergers
▪ If Splitters are efficient, we can read the flights data with zero
impact on Cassandra’s performance
22. Splitters
▪ Step 1 - Reverse-engineer SSTable format from Cassandra src
▪ Step 2 - Implement fast SSTable parser in C++
▪ Step 3 - Implement mechanism for new SSTable detection
▪ Step 4 - Stream all the data to Mergers - including the
“last_update_timestamp”
▪ Step 5 - deploy the Splitter on every Cassandra node
23. Mergers
▪ Distributed storage, accepting data from Splitters
▪ Sharding based on logical key in our data - useful for
precomputation and streaming to our Engine
▪ Replication factor of 1 - If any node fails, remaining nodes have
to take it’s shards - restream everything!
24. Problems
▪ MemTable -> SSTable latency (±undefined)…
▪ … and eventual consistency - Splitters on all replicas ...
▪ … some data could be missing
▪ Cassandra’s vs our sharding - Merger failure -> complete reload
▪ Depending on internal format - zero support, no guarantees,
problematic documentation, insane
▪ Additional development, it took some time to get right
25. The good things
▪ Allows us to do frequent full-data dumps
▪ Performance
• Our C++ parser is very fast
• During normal operation - near-zero load on DB servers
▪ Zero impact on production DB - complete isolation
▪ Mergers - custom built for our use case - very efficient
27. Scylla is better
▪ Currently migrating, some problems (Scylla is too good)
▪ Testing -> continuous full table scans - filter for
“last_update_timestamp”
▪ Using token ranges - Scylla can handle, no overloading
28. What’s next?
▪ SSTable parser removal - Amazing!!!
▪ Two possible scenarios
a. Keep splitters and read preferably local token ranges (Complex)
b. Keep only Mergers and read the data directly (Much easier)
29. Problems
▪ MemTable -> SSTable latency (±undefined)…
▪ … and eventual consistency - Splitters on all replicas ...
▪ … some data could be missing
▪ Cassandra’s vs our sharding - Merger failure -> complete reload
▪ Depending on internal format - zero support, no guarantees,
problematic documentation, insane
▪ Additional development, it took some time to get right
Someone had very good idea - we will do custom sharding!
But then, the Postgres started to fail due to high read count, so guess what, people had another great idea!
We will use redis! Great thing to maintain.
Who thinks Redis
Scylla is on the left.
We will be able to remove the wall of redises.
One of the workarounds will be the main topic of this presentation
Our engine is always hungry
So have you heard of SSTables?
Have you heard of Java?
We will get to last_update_timestamp later
Why is it ok to have replication factor 1?
Mention last_update_timestamp
The main good thing is it actually WORKS! In production for more than a year.
Even if things get out of hand, we only overload Mergers