Debates about scaling can often be abstract. Debaters may not even have genuine scaling issues. Rationales for one strategy over another can be highly subjective preferences rather than borne out of experience. This is definitely not the case for this talk. We will discuss the very real scaling issues at lastminute.com - highlighting not just how Kubernetes helped but also the context around those strategy decisions.
4. Micro-problems at scale
● alignment
● real pipelines
● infrastructure
● resilience
● monitoring
● constraints
5. An year-long endeavour
● build a new, modern infrastructure
● migrate the search (flight/hotel) product there
... without:
● impacting the business
● throwing away our whole datacenter
6. How we did that: technology
● company framework
● docker
● kubernetes
7. How? Teams and peopleHow we did that: team/people
https://www.pexels.com/photo/blue-lego-toy-beside-orange-and-white-lego-toy-standing-during-daytime-105822/
14. /liveness:
● when tomcat container is up
● when “active/max” threads < threshold
/readiness:
● all the startup jobs have run
● no termination request has been received
.. ongoing never-ending research ..
Self-healing: our choice for resilience
16. ● zero downtime during rollout
● monitoring in place
● alerting
● centralized logging
● legacy infrastructure to the rescue in case of problem
When can you test with production traffic?
17. ... failure ... at all different levels ..
https://www.flickr.com/photos/ghost_of_kuji/2763674926
22. ● lead and migration time
● resilience
● root cause analysis
● speed of deployment
● instant scaling
... benefits
23. ● 36 bare-metal nodes (only for production cluster)
● 5100 req/sec in the new cluster
● 2M metrics/minute flows
● 35 micro-services migrated in 5 months
○ 3 new micro-services migrated per week
○ 10 minutes to create a new environment
● 11 min to roll-out a new version with 55 instances
○ whole pipeline runs in 16 min
Give me the numbers!