2. About me
● Vedran – a developer
● @ Booking.com for 2 years
● Team lead and product owner of the Event
System team
2
3. About Booking.com
● We sell room nights
● Over 650 000 per day
● More than 400 000 hotels
● We employ over 6500 people ( globally )
● Hundreds of developers and designers working on
code and templates
● Thousands in: customer care, hotels department,
content department, etc.
3
4. ● Dozens of deploy-able systems
● And we really like rolling them out...
● Yesterday: 50 roll-outs
● ( + experiments )
● This is a regular day at Booking.com
4
5. Organization & Structure
● Small teams …
● … that own the systems they work on.
● Flat hierarchy
● Self steering
● Frontend and Backend
5
6. Workflow
● Beyond Scrum
● Standup and “Scrum of Scrums”
● Small steps
● Progress and priorities constantly tracked and
managed
● Shared codebase
● No formal QA step ( before production )
● Failure is OK!
6
7. Development and Testing
● DQS – Our test environment
● Virtual machines mirroring production
environments
● Managed by a “self-build” tool
● Backed by great tooling
● No QA to get in the way!
● Next stop: production!
7
8. Deployment
● Get new code to servers reliably and
quickly
● Sometimes: get old code back even faster
● git-deploy: github.com/git-deploy
● Integrated with other systems - simple and
robust
● Flexible
8
10. ● Ownership of deploy
● Hotfix
● Incident handling
● Communication, communication,
communication
10
11. Monitoring
● We want to monitor everything
● Focus on relevant metrics
● Event-logging system – information
aggregator
● Information is conveyed by events
● Free-form and accessible data feed
11
13. Monitoring - Graphite
● Cluster nodes in the hundreds
● Tens of millions of metrics per minute
● Custom modifications:
● github.com/grobian/carbon-c-relay
● github.com/dgryski/carbonzipper
13
14. ● Graphite is: versatile, simple, fast...
● …but not a silver bullet
● It kills SSD-s !
● Application level metrics + full server
health monitoring
14
16. Monitoring – tools
● Time proven tools used daily:
● Live application monitoring ( Landweg /
Graphite dashboards)
● Server health monitoring ( graphite tools )
● Error/warning aggregation ( show_errors )
● Real-time business reporting ( Controlrooms )
● Constantly improving
16
17. ● The Experiment Tool
● Backbone tool for Frontend teams
● Real time experiment data
● Complex analytics and breakdowns
17
18. ● Shadowapp
● Our “canary in the coalmine”
● Runs code from “trunk” vs real requests
● Smokes out subtle bugs and issues
● Edge cases by real users, not developers
18
19. Challenges
● Keeping it flat
● Deployment complexity
● Deployment speed
● Controlling Guiding the constant creation
of new data and monitoring systems
● Scaling event-logging and Graphite
19