Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

OpenNebulaConf2018 - Our Journey to OpenNebula - Germán Gutierrez - Booking.com

92 vues

Publié le

The use-case story of Booking.com and what it took to get 1700 users up and running with OpenNebula.

Publié dans : Logiciels
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

OpenNebulaConf2018 - Our Journey to OpenNebula - Germán Gutierrez - Booking.com

  1. 1. Our journey to OpenNebula. Germán Gutiérrez Team Carmen
  2. 2. Germán Gutiérrez Linux Systems @Booking.com 15+ years of experience in high scale deployments, DevOps, Linux/Unix Administration, DNS, Web servers, MySQL, OpenLDAP, Shell/perl/python scripts & more At Booking.com leading implementation & scaling of OpenNebula systems application in development environment, impacting 1700+ tech and product users
  3. 3. Team Carmen: ● Nate Nuss - Team Leader ● Maria Scerbikova - Product Owner ● Giordano Fechio - System Administrator ● Lily Chen - Developer ● Omar Othman - Developer
  4. 4. Special thanks to: ● Mariano Guezuraga - Senior System Administrator
  5. 5. What’s this talk is about. What’s this talk is not about.
  6. 6. Our size: ● Over 1000 hosts ● Over 13k VMs running ● Over 2000 Users ● And counting...
  7. 7. In short. 2016 MAY Joined the company and a team of 2. 2016 OCT Hackathon with OpenNebula. 2017 The year of OpenNebula: test, partial migration, team growth. 2018 The final move and current state.
  8. 8. Our use case. ● Development environment ● Many templates/roles ● User oriented ● Template/role ownership ● Our team provides the infrastructure
  9. 9. How it was. ● In house solution (perl, python, bash) ● Master & hosts ● Host with local storage ● VMs in the same network (bridged) ● The SoT (puppet class, DNS, IP assignment) was an internal app for physical servers ● The scheduler was a cronjob
  10. 10. How it is. ● Network: OVS + VLANs ● Storage: ○ NFS/NetApp ○ Image / Template ● Tooling: cli in python ● We are the SoT: ○ Web Service in python ○ IP assignment by ONe ○ Updates DNS via hook
  11. 11. What worked with no issues.
  12. 12. What didn’t work as expected. ● Networking: ○ One big Network to rule them all ○ Incident ○ Lesson learned, future.
  13. 13. What didn’t work as expected. ● SoT - Source of truth: ○ ONe API is slow*, then it needs cache! ■ Tuning oned helped ■ What can possibly go wrong with cache? ○ Lesson learned ■ FQDN as ID is a bad idea, bad design.
  14. 14. What didn’t work as expected. ● Tooling, python with python-oca ○ Supports ONe up to 4.10 ○ Python-oca last commit 2017 ○ Cumbersome to maintain ○ Lesson learned
  15. 15. What didn’t work as expected. ● Storage Shared/NetApp ○ Huge impact. ○ We had one Volume ○ We feared space issues ○ We had high CPU usage due to I/O ○ First actions: “easy” because of our use case. ○ “Solving” the issue. ○ Lesson learned.
  16. 16. Where are we?. ● Still caching, enter “bone” as a web service. ● As a team we are the MITM: ○ Working on a self service page for troubleshooting. ○ The same for role/template owners. ○ We need to split the bone and the selfservice code ● The NFS issue is “gone” (expensive) ● WIP: Rewriting the CLI in ruby: brone.
  17. 17. Where are we?. (cont.) ● We still don’t know how to deal with retries. ● Networking: Raised an issue to let us go for a SDN, so the network drivers are pluggable as other parts of the system. ● We have wasted resources: the user won’t destroy the VM after usage.
  18. 18. Q & A
  19. 19. Thank you

×