Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

How we scale up our architecture and organization at Dailymotion

250 vues

Publié le

End of 2016, Dailymotion revamped the whole company, in that slide, we will explain you how we have used the DevOps mindset as an enabler to scale up our engineering team and our architecture.

Publié dans : Technologie
  • Soyez le premier à commenter

How we scale up our architecture and organization at Dailymotion

  1. 1. from a french monolith to a worldwide platform: a human story
  2. 2. Stan Chollet Chapter lead Core API Tribe Scale @ Dailymotion https://stan.life Président Association Orléans Tech Formateur Kubernetes & GraphQL
  3. 3. 3 3billion video views per month 300million unique visitors per month 150million videos in our catalogue Dailymotion, one of the leading video destination platforms in the world
  4. 4. OUR MISSION 4 transforming our video platform into a global destination for must-see videos. Building the best “go-to” experience where users can get their daily dose of must-see videos, and partners can leverage the latest tools to grow and monetise their audience.
  5. 5. FROM MONOLITH TO SOA 5 Our road to micro-service architecture SOA • monolith LAMP Stack • hosted on bare-metal • mono-datacenter (PARIS) • REST API • fullstack website • geo-distributed • apps run in container (docker) • orchestrated on top of Kubernetes • multiple languages (mainly Python / Golang) • GraphQL API • fully API Centric TO
  6. 6. GRAPHQL - AN ENABLER FOR OUR FRONTEND AND OUR BACKEND 6 FROM. TO. Monolith PHP Website HTML REST API GraphQL svc 1 python svc 2 golang svc 3 java
  7. 7. GRAPHQL - AN ENABLER FOR OUR FRONTEND AND OUR BACKEND 7 TRIBES ? SQUADS ? Tribe Squad Squad Chapter Squad Chapter Tribe Squad Squa Chapter Chapter Squad
  8. 8. GRAPHQL - AN ENABLER FOR OUR FRONTEND AND OUR BACKEND 8 SOA AS AN ORGANIZATIONAL ENABLER. SOA (geo-distributed) architecture GraphQL Data service User service Partner service Monolith (mono-datacenter) Website HTML REST API ownership product enabler ownership product tribes ownership mixed
  9. 9. FIRST STEP 9 • Built & managed by one team (2 people) • Deployed in 3 regions on AWS • Orchestrated on top of kubernetes • Apps deployed with custom bash scripts • Good application monitoring • Poor infrastructure monitoring FROM SEPTEMBER 2016 TO JANUARY 2017. GraphQL REST Legacy PHP Search python Kubernetes on AWS FOUNDATIONS•
  10. 10. SECOND STEP 10 TIME TO SCALE• FROM JANUARY 2017 TO JUNE 2017. People • from 2 to ~30 people. • from 1 to 5 teams Services • from 1 to ~15 services. • from 1 to ~10 languages / technologies Release • from an average of 1 deployment per day to more than 10
  11. 11. HUMAN FIRST • Hired more than 30 people over a couple a months • Organised training sessions for newcomers • Optimised and reviewed our on-boarding process • Optimised the way to work on an SOA stack • Evangelised (GraphQL + Infrastructure) FROM 2 TO ~30 PEOPLE.
  12. 12. • Only one dependency on the developer's laptop: docker • Simplify the technical on-boarding process • Simplify the project switching over our 500+ repositories • Use generic tasks name to launch code quality checks • Let developers use the technologies they want 12 make style make test make test-unit make test-functional make test-integration make complexity make run
  13. 13. FROM AWS TO GCP 13 • Worldwide network (subnets can be routed from one region to another) • Ingress anycast IP, easy to setup • A hosted Kubernetes managed service with cool features such as node autoscaling • Connection to Dailymotion’s private network in Paris • Currently deployed in 3 regions across the world (~80 nodes) FROM 1 SERVICE TO 10 SERVICES.
  14. 14. NEW HIGHLY SCALABLE HYBRID ARCHITECTURE 14 Geo-Distributed for high performance everywhere in the world Hybrid Infra on Premise together with Google Cloud Auto-scaling adapts to the audience Google Cloud POP On Premise POP CDN
  15. 15. GIVE ROOT ACCESS TO DEVELOPERS 😎 15 • Implement continuous deployment
 (except production which needs human approval) • Let developers deploy by themselves • Delegate deployment workflow to developers through Jenkinsfile (Pipeline). • Enforce common interfaces, minimum code quality, deployment guidelines built by the devops team FROM 1 DEPLOYMENT PER DAYTO MORE THAN 10.
  16. 16. WE ARE LEARNING FROM OUR MISTAKES 16 STEP #1:
 First we deployed our applications sequentially, region by region using bash scripts STEP #2:
 We wanted to manage our cluster from a single API endpoint : Federation Some API objects were missing in the Federation → mixed deployment methods : some objects in the Federation and others deployed region by region. STEP #3 (déjà-vu):
 Now, we’re deploying our applications sequentially region by region using Helm FROM 1 DEPLOYMENT PER DAYTO MORE THAN 10.
  17. 17. CHARTS EVERYWHERE ! 17 • Manage dependencies between our applications. • Deploy a complete stack with a single command. • Help us to manage different environments/regions within a chart. • Easy to rollback: each deployment has a unique revision id • Ongoing : Provision a staging environment per pull request FROM 1 DEPLOYMENT PER DAYTO MORE THAN 10.
  18. 18. FROM SLA 99,999% TO 99,9999999999999999999999999999999999% 18 • APM with Open Tracing Specification • Monitoring / Alerting • Logging Specification for each service • Feature Flipping, Progressive rollout, Experimentation (A/B) HOW WE OPERATE OUR PLATFORM?
  19. 19. WE ARE NOT ROBOTS 19 BUILD. Software Engineer • Write code • Build applications which aren’t easy to operate SHIP. Release Engineer • Package & deploy applications RUN. System Engineer • Operate infrastructure & app • Unable to fix applications by themselves FROM SOFTWARE / SYSTEM ENGINEER TO PRODUCTION ENGINEER. BUILD / SHIP / RUN. Production Engineer • Can build applications • Package & deploy applications • Operate application in production • Build their applications with “RUN” mindset • Build tools for software engineers TO
  20. 20. helm upgrade —install westeros —reuse-values —set imageTag=30610c5 dailymotion/westeros-gbased-raulicache BOOM ! WHAT: Bad parameter applied on helm command • 3 clusters emptied (~ 1 300 containers) • All our products were unusable AND: We were down during 19 minutes • ~10 minutes to be notified • ~7 minutes to understand • ~2 minutes to recover from scratch the entire architecture NOW: Grow up • Wrap destructive commands • Improve monitoring
  21. 21. INFINITE AND BEYOND 21 • Hybrid architecture (on premises) • Stateful use cases: manage volume provisioning in the same way we orchestrate applications • Performance improvements (Service mesh) • Security: user authentication and auditing, secrets encryption. • Open Source our GraphQL Engine (Python, performance oriented) AND NOW ?
  22. 22. 22 Thank you. https://gazr.io

×