Zen.ly is a location-sharing mobile app that helps you find your friends’ whereabouts in real-time. They have run Scylla in production for two and a half years. Learn how much they’ve scaled over that time, managing growing cluster using Google Kubernetes Engine.
Zenly Shares Best Practices for Running ScyllaDB on GCP
1. Scylla DB on GCP:
2 ½ years in production
Steeve Morin, software engineer
Jean-Baptiste Dalido, head of infrastructure
2. Presenters
Steeve Morin, software engineer
Hacking since the 90’s. Skydiver.
Jean-Baptiste Dalido, head of infrastructure
Hacking since the 00’s. Motorcycle enthusiast.
3. Meet
Zenly makes it fun & easy to know what
friends & family are up to.
We believe that making maps more
social and personalized is an
opportunity to improve the way people
live, communicate and spend time
together.
50+ team members based in Paris.
4. ScyllaDB usage at Zenly
In 2 years:
■ 10x more ops/s: 300k/s to 3.3M/s
● 1.9M reads/s
● 1.4M writes/s
■ 2x more nodes: 27 to 50 nodes
● But from 10 to 32 cores/node
● 270 to 1600 cores
■ 4x more storage: 15TB to 59TB
5. Google Kubernetes Engine: why not
Wanna stick with default setup:
Pros:
■ GKE nodes auto repair/upgrade
■ Default GKE setup is nice
● Network
● Logs
■ Rolling updates
6. Google Kubernetes Engine: why not
Cons:
■ Auto upgrade means losing nodes every day
● Which means repairs...
■ Network is expensive because Docker
■ Not leveraging autoscaling
● HPA, VPA
■ Auto upgrade means losing
■ CPU pinning is not easy to manage with “classical” neighbours
→ Not worth it in the end
7. GCP Network
Very happy so far:
■ Only two problems in the last 2 ½ years
● One was a Google software update gone wrong, took down lots of big players
■ Inter-zone network is very expensive
● Careful when considering multi-DC clusters
8. Evolution of instance types
We went from custom-standard-10 (10 cores) to n1-standard-32 (32 cores):
■ More shards per node
■ Sweet spot for good disk efficiency
■ Considering moving to c2 or n2 instances
● c2: 40% more performance
● n2: 10-15% more performance
● No cost analysis yet
9. Instance configuration
Instances are configured with:
■ 3 local NVMe SSDs in RAID 0
■ Ubuntu 18.04 LTS
■ Kernel linux-gcp 4.15.0.1029
● Major issues with NVMe on more recent ones
● Stay on =< 1029 for now
● THANK YOU GLAUBER AND SUPPORT TEAM <3
10. Thank you Stay in touch
Any questions?
Steeve Morin
steeve@zen.ly
@steeve
Jean-Baptiste Dalido
jb@zen.ly
@jbaptistedalido