Successfully reported this slideshow.
Kubernetes and lastminute.com group:
our course towards better scalability
and processes
michele.orsi@lastminute.com
Milan...
The inspiring travel company
lastminute.com group in numbers
40 countries
17 languages
10M
travellers per year*
€ 2.5B GTV*
€ 250M revenue*
43M
users p...
A tech company to the core
Tech department: 300+ people
Modules: ~100
Database: 150 schemas, 3300 tables, TB data
Instance...
https://www.pexels.com/photo/turtle-walking-on-sand-132936/
“Business thinks developers are slow"
lastminute.com group: an agile company
● Scrum and Kanban
● TDD
● clean code
● continuous integration
● code review
● inte...
Starting from the monolith ...
https://www.flickr.com/photos/southtopia/5702790189
https://www.pexels.com/photo/gray-pebbles-with-green-grass-51168/
... broken into microservices
The improvements needed
● alignment
● real pipelines
● infrastructure
● resilience
● monitoring
● remove constraints
An year-long endeavour
● build a new, modern infrastructure
● migrate the search (flight/hotel) product there
... without:...
TODO list
● company framework
● docker
● kubernetes
How? Teams and peopleNew teams
https://www.pexels.com/photo/blue-lego-toy-beside-orange-and-white-lego-toy-standing-during...
Our infrastructure and technologyOur infrastructure and technology
https://www.pexels.com/photo/colorful-toothed-wheels-17...
● build once, run everywhere
● externalised configuration
Docker containers
Docker containers
registry.intra/application:v2-090025112016
BASE OS
JAVA SDK
START/STOP SCRIPTS
JAR APPLICATION
● build o...
Kubernetes
● independent from OS/hosts
● isolated env, managed at scale
● self-healing
● externalised configuration
Omega ...
https://www.pexels.com/photo/red-toy-truck-24619/
“Your infrastructure on wheels”
Kubernetes: physical representation
NODE
1
DOCKER
ETCD
K8S
cluster
FLANNEL
NODE
2
DOCKER
ETCD
K8S
FLANNEL
NODE
28
DOCKER
E...
Kubernetes: logical representation
NAMESPACE1 CPU 10
MEM 40GB
NAMESPACE2 CPU 20
MEM 50GB
NAMESPACE3 CPU 80
MEM 60GB
NAMESP...
APP3-PRODUCTION
Kubernetes: our architecture
APP2-PRODUCTION
APP1-PRODUCTION
APP3-PRODUCTION
APP2-PRODUCTION
APP1-PREVIEW
...
Kubernetes: our architecture and choices
APP1-PRODUCTION
deployment
replica-set
POD
3
POD
2
POD
1
production
Kubernetes: our architecture and choices
APP1-PRODUCTION
deployment
replica-set
secret configmap
POD
3
POD
2
POD
1
product...
Kubernetes: our architecture and choices
APP1-PRODUCTION
deployment
replica-set
app1.lastminute.intra
secret configmap
POD...
APP1-PRODUCTION
Kubernetes: our architecture and choices
POD
collectd
production
application fluentd
Kubernetes: what’s left outside?
● datastores
● distributed caches
● distributed locking
● pub-sub
● logs and metrics stor...
1st try (with test app), it seemed to work
https://www.flickr.com/photos/26516072@N00/2194001232
The self-healing term describes any application,
service, or a system that can discover that it is
not working correctly a...
Kubernetes agnostic interfaces
“When a container is dead I will restart it”
“When a container is ready I will forward traf...
Kubernetes probes: liveness & readiness
Two questions for dev:
● when can I consider my
container alive?
● when can I cons...
/liveness:
● when tomcat container is up
● when ratio “active/max” threads are lower than a
threshold
/readiness:
● all th...
● zero downtime during rollout
● monitoring in place
● alerting
● centralized logging
● legacy infrastructure to the rescu...
... failure ... the big one!
https://www.flickr.com/photos/ghost_of_kuji/2763674926
Problems
● configuration
● infrastructure
● tools
● manual mistakes
● (external) scalability
● temporary team focus on objective
● automation
● monitoring
● Go deeper in docker/kubernetes
Another improvement step
Pipeline: a huge step forward
microservice = factory.newDeployRequest()
.withArtifact(“com.lastminute.application1”,2)
lmn...
APP1-PRODUCTION
Monitoring: grafana/graphite/nagios
cluster
graphite
n collectd
Grafana
nagios
icons from http://www.flati...
“Go” deep .. whatever language it takes
https://www.pexels.com/photo/sea-man-person-ocean-2859/
There’s light ..There’s a light .. at the end
https://www.pexels.com/photo/grayscale-photography-of-person-at-the-end-of-t...
● lead and migration time
● resilience
● root cause analysis
● speed of deployment
● instant scaling
... benefits
● 1300 req/sec in the new cluster
● 25 micro-services migrated in 4 months
● 1 week to migrate an application
● 10 minutes...
Yes, we’re hiring!
THANKS
www.lastminutegroup.com
Prochain SlideShare
Chargement dans…5
×

Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Milan 2016)

1 287 vues

Publié le

In one year we migrated a full set of micro-services into a new infrastructure based on Kubernetes and Docker.
I will present how we get there describing real-life challenges, problems faced and solutions found

Publié dans : Technologie
  • Soyez le premier à commenter

Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Milan 2016)

  1. 1. Kubernetes and lastminute.com group: our course towards better scalability and processes michele.orsi@lastminute.com Milan, 25-26 November 2016
  2. 2. The inspiring travel company
  3. 3. lastminute.com group in numbers 40 countries 17 languages 10M travellers per year* € 2.5B GTV* € 250M revenue* 43M users per month* *data as 31st December 2015 icons from http://www.flaticon.com
  4. 4. A tech company to the core Tech department: 300+ people Modules: ~100 Database: 150 schemas, 3300 tables, TB data Instances: 1400+ Locations: Chiasso, Milan, Madrid, London, Bengaluru
  5. 5. https://www.pexels.com/photo/turtle-walking-on-sand-132936/ “Business thinks developers are slow"
  6. 6. lastminute.com group: an agile company ● Scrum and Kanban ● TDD ● clean code ● continuous integration ● code review ● internal communities
  7. 7. Starting from the monolith ... https://www.flickr.com/photos/southtopia/5702790189
  8. 8. https://www.pexels.com/photo/gray-pebbles-with-green-grass-51168/ ... broken into microservices
  9. 9. The improvements needed ● alignment ● real pipelines ● infrastructure ● resilience ● monitoring ● remove constraints
  10. 10. An year-long endeavour ● build a new, modern infrastructure ● migrate the search (flight/hotel) product there ... without: ● impacting the business ● throwing away our whole datacenter
  11. 11. TODO list ● company framework ● docker ● kubernetes
  12. 12. How? Teams and peopleNew teams https://www.pexels.com/photo/blue-lego-toy-beside-orange-and-white-lego-toy-standing-during-daytime-105822/
  13. 13. Our infrastructure and technologyOur infrastructure and technology https://www.pexels.com/photo/colorful-toothed-wheels-171198/
  14. 14. ● build once, run everywhere ● externalised configuration Docker containers
  15. 15. Docker containers registry.intra/application:v2-090025112016 BASE OS JAVA SDK START/STOP SCRIPTS JAR APPLICATION ● build once, run everywhere ● externalised configuration
  16. 16. Kubernetes ● independent from OS/hosts ● isolated env, managed at scale ● self-healing ● externalised configuration Omega paper: http://research.google.com/pubs/pub41684.html
  17. 17. https://www.pexels.com/photo/red-toy-truck-24619/ “Your infrastructure on wheels”
  18. 18. Kubernetes: physical representation NODE 1 DOCKER ETCD K8S cluster FLANNEL NODE 2 DOCKER ETCD K8S FLANNEL NODE 28 DOCKER ETCD K8S FLANNEL ...
  19. 19. Kubernetes: logical representation NAMESPACE1 CPU 10 MEM 40GB NAMESPACE2 CPU 20 MEM 50GB NAMESPACE3 CPU 80 MEM 60GB NAMESPACE4 CPU 5 MEM 5GB cluster
  20. 20. APP3-PRODUCTION Kubernetes: our architecture APP2-PRODUCTION APP1-PRODUCTION APP3-PRODUCTION APP2-PRODUCTION APP1-PREVIEW APP3-PRODUCTION APP2-PRODUCTION APP1-DEVELOPMENT APP3-PRODUCTION APP2-PRODUCTION APP1-QA APP3-PRODUCTION APP2-PRODUCTION APP1-STRESSTEST nonproductionproduction
  21. 21. Kubernetes: our architecture and choices APP1-PRODUCTION deployment replica-set POD 3 POD 2 POD 1 production
  22. 22. Kubernetes: our architecture and choices APP1-PRODUCTION deployment replica-set secret configmap POD 3 POD 2 POD 1 production
  23. 23. Kubernetes: our architecture and choices APP1-PRODUCTION deployment replica-set app1.lastminute.intra secret configmap POD 3 POD 2 POD 1 loadbalancer-app1 production
  24. 24. APP1-PRODUCTION Kubernetes: our architecture and choices POD collectd production application fluentd
  25. 25. Kubernetes: what’s left outside? ● datastores ● distributed caches ● distributed locking ● pub-sub ● logs and metrics storage
  26. 26. 1st try (with test app), it seemed to work https://www.flickr.com/photos/26516072@N00/2194001232
  27. 27. The self-healing term describes any application, service, or a system that can discover that it is not working correctly and, without any human intervention, make the necessary changes to restore itself to the normal or designed state. Self-healing ref: https://technologyconversations.com/2016/01/26/self-healing-systems
  28. 28. Kubernetes agnostic interfaces “When a container is dead I will restart it” “When a container is ready I will forward traffic to it”
  29. 29. Kubernetes probes: liveness & readiness Two questions for dev: ● when can I consider my container alive? ● when can I consider my container ready to receive traffic? spec: containers: livenessProbe: httpGet: path: /liveness successThreshold: 3 failureThreshold: 2 readinessProbe: httpGet: path: /readiness successThreshold: 3 failureThreshold: 2 deployment.yaml
  30. 30. /liveness: ● when tomcat container is up ● when ratio “active/max” threads are lower than a threshold /readiness: ● all the startup jobs have run ● no termination request has been received .. ongoing never-ending research .. Our choices: framework - k8s
  31. 31. ● zero downtime during rollout ● monitoring in place ● alerting ● centralized logging ● legacy infrastructure to the rescue in case of problem 2nd try (with production traffic)
  32. 32. ... failure ... the big one! https://www.flickr.com/photos/ghost_of_kuji/2763674926
  33. 33. Problems ● configuration ● infrastructure ● tools ● manual mistakes ● (external) scalability
  34. 34. ● temporary team focus on objective ● automation ● monitoring ● Go deeper in docker/kubernetes Another improvement step
  35. 35. Pipeline: a huge step forward microservice = factory.newDeployRequest() .withArtifact(“com.lastminute.application1”,2) lmn_deployCanaryStrategy(microservice,”qa”) lmn_deployStableStrategy(microservice,”preview”) lmn_deployCanaryStrategy(microservice,”production”) pipeline
  36. 36. APP1-PRODUCTION Monitoring: grafana/graphite/nagios cluster graphite n collectd Grafana nagios icons from http://www.flaticon.com
  37. 37. “Go” deep .. whatever language it takes https://www.pexels.com/photo/sea-man-person-ocean-2859/
  38. 38. There’s light ..There’s a light .. at the end https://www.pexels.com/photo/grayscale-photography-of-person-at-the-end-of-tunnel-211816/
  39. 39. ● lead and migration time ● resilience ● root cause analysis ● speed of deployment ● instant scaling ... benefits
  40. 40. ● 1300 req/sec in the new cluster ● 25 micro-services migrated in 4 months ● 1 week to migrate an application ● 10 minutes to create a new environment ● 11 min to gracefully roll-out a new version with 55 instances ● whole pipeline runs in 16 min ● 1.5M metrics/minute flows Give me the numbers!
  41. 41. Yes, we’re hiring! THANKS www.lastminutegroup.com

×