This document discusses Bakson's efforts to implement continuous integration, delivery, and deployment practices for Ticketmaster's API team. It outlines the tools used such as Gitlab, Jenkins, SonarQube, Nexus, Rundeck, and Gatling. Automation is triggered upon code commits to run tests and deploy to environments. Testing occurs for each microservice rather than all services at once. This allows faster feedback loops while deploying features. The goal is to deploy to production continuously while ensuring quality and stability.
2. A bit of context…
2
Global ticket sales and distribution company.
A cliche, but the global leader in it’s line of business.
Large IT operation.
Engineering HQs in Los Angeles and London.
More than 150 platforms/products.
Both legacy stuff and edge technologies.
Ticketmaster
Belgrade based IT company.
Ticketmaster’s development centre.
Currently around 50 people, only engineering.
Mainly Java projects.
Strong in local Scala community.
Bakson
3. A bit of context…
3
Why emphasise the “quality” ?
4. A bit of context…
4
Each high priority production bug (Business Disruption) can be
directly linked to and measured in money loss.
Bug? Fans can’t purchase the tickets.
Bug? Fans can’t enter the venue.
Because the Business people
5. A bit of context…
5
Because the fans
Entire Adele’s European tour was sold out in two days, in less than 15 minutes per day.
Huge success. But…
7. A bit of context…
7
How can DevOps help teams?
And how to move “there”?
8. A bit of context…
8
Last phase targets: Canary release, Chaos Monkey, etc.
DevOps Maturity Model
Company wide initiative.
Assessed by Gartner.
18 categories - “Deployment”, “Support”, etc.
Products are required to “move” through the matrix.
Progress is constantly evaluated.
Additional benefits: standardisation, guidance.
9. Public API
9
HTTP service. Not RESTful.
Close to 100 endpoints/actions.
2 years live in production.
Development + QA team size = 10 people
10. Public API
10
Distributed architecture (microservices).
Java stack.
Storages: relational, NoSQL, search engines…
APIGEE as management layer.
Each microservice has it’s on source code repository.
11. Issues list
11
A week of testing upon release development is completed.
Long lasting regression campaign1
Only going to shared environment after entire release is developed/completed.
Late integration with clients2
Variety of tools. Or even manual. Procedures differ from env to env.
Non-standardised deploy procedures3
Automated testing on entire release, also clients are testing only entire release build.
Difficult to pinpoint a root cause of broken functionalities4
12. The goal
12
Start automation on feature completion (code pushed to repository)
Run
Unit
Tests
Do Static
Code
Analysis
Build
& Save
Package
Deploy
Check
Service
Status
Run
Integration
Tests
Send
Reports
13. Tool - Gitlab
13
Git repository management tool.
Many additional features: code review, continuous integration, deploy…
On premise or SaaS.
Free and Commercial editions.
In our flows first point since via webhooks, upon code push,
the next tool in the flow is triggered.
(Note: our first AWS-based service is utilising CI on the Gitlab. But that is WiP.)
14. Continuous Integration
14
Start automation on feature completion (code pushed to repository)
Run
Unit
Tests
Do Static
Code
Analysis
Build
& Save
Package
Deploy
Check
Service
Status
Run
Integration
Tests
Send
Reports
Continuous Integration (CI) is a development practice that requires developers to integrate code
into a shared repository several times a day. Each check-in is then verified by an automated build,
allowing teams to detect problems early.
- Martin Fowler
15. Tool - Jenkins
15
Automation server.
Gets additional power from numerous plugins available.
Open source. Available only as on premise.
Main unit is “job”.
In TM jobs can be created only through code repository. Creation via GUI is disabled.
Two configuration XML files are part of the application code.
Reasoning:
- (distributed) versioning
- easy to restore in case of issues with Jenkins server
- easy to migrate between Jenkins instances
17. Tool - SonarQube
17
Platform for continuous inspection of code quality.
More than 20 programming languages are covered.
Open source. Available only as on premise.
Some TM teams are failing Jenkins job on code quality violations.
API team is reviewing reports per Sprint/Release
Using FindBugs as a plugin.
18.
19.
20. Tool - Nexus
20
Artifact repository.
Free or Commercial. Available only as on premise.
OOTB providing support for multiple platforms (Java, NPM, Docker…).
TM instance is locked for manual upload of artifacts.
Only Jenkins instances can upload, through predefined Release plugin.
Support for release process, only “promoted” artifacts are available for Production deploy.
API team is reviewing reports per Sprint/Release.
23. GitFlow
23
Branching model, introduced by Atlasssian.
Feature/task is merged to “develop”
on completion (as by “Definition of Done”).
“Release” branch is created on demand.
“Release” is merged to “master”
when ready for production.
This helps answering “When?”. On merge to “develop”.
24. Where?
24
The major problem…
…is not developing tests
…it’s not creating environments
…it’s not even about automating the whole thing.
IT’S ALWAYS DATA.
25. Data setup
25
Because you (usually) can’t control data
in your dependencies.
Easier to initially develop.
Difficult to maintain.
Tracking evolution of dependencies.
Allows easier setup of testing environemnts.
Use mocks
It allows testing in “real” environment.
Difficult to initially develop.
Easier to maintain, since owners of your data
will have to migrate it together with rest of their data.
Permanent data sets
We decided to go with permanent sets!
There is a creation tool available on TM backends.
26. API environments
26
DEVs TPI Production(s)QAs Stage
CAP
Stage and Production have SLAs defined.
Mapping to Gitflow:
“develop” -> TPI, “release” -> Stage, “master” -> Prod
27. Where?
27
Each service (should) have it’s own integration tests.
Test everything !!!
But for API it is crucial that on Gateway
“everything works”.
28. Tool - Rundeck
28
Tool for runbook automation and execution of arbitrary management tasks.
Open source. Available only as on premise.
Is Rundeck even needed if you already use Jenkins?
- “Rundeck is made for Operations
and knows about the details of your environments.”
- “Jenkins is fundamentally not a deployment tool,
although it can be used like one.”
29. QA framework
29
Separate project. Own source-code repo.
Implemented in Java. Maven project.
Uses standard HTTP clients and Java testing libs (JUnit, TestNG).
Used for functional testing.
Blackbox testing of our services (no DB access, log checks…)
Smoke suite: ~1.000 tests, ~5 mins to execute
Regression suite: ~10.000 tests, ~35 mins to execute
Every feature or bug we ever had is included in the regression suite.
We are constantly supporting 2 API versions with test suites covering both.
30. Implementation issues…
30
- New feature branching-out will result with identical copy of Jenkins XML configs.
- Jenkins plugins have limited support for conditional executions in some phases.
Limit Jenkins “job” only to be executed from “develop”1
- Another set of conditionals/variables to be set/passed between jobs.
QA “job” only to be triggered by service’s “develop”2
- Only way to cover all cases/features is to always deploy and test all service.
Know services that are involved in feature3
31. Try with job chaining
31
Standard Jenkins "freestyle" jobs support
simple sequential tasks execution.
Doesn’t work in our case.
- Git triggers would result in service restarts
while test execution is active.
Additional idea was to introduce additional branch
so that entire flow would not be triggered from “develop”.
- Additional work/thinking required from developers.
- Where to place “signal” that would trigger entire flow?
32. Try with plugins
32
“Closest” to what to we need found in “JobFanIn” plugin.
This plugin provides a watch on upstream projects
to trigger downstream projects
once all upstream projects are successfully build.
Doesn’t work in our case.
- Impossible to predict on which services will feature reside.
33. Step back. Rethink.
33
Do we really need to deploy and test everything always?
Does this approach actually fits microservices architecture?
LETS
SIMPLIFY.For each service only deploy and test itself.
Yes, developers will need to do additional thinking
when finishing feature that spans over multiple services.
34. Testing agreement
34
On merge to develop (as by Gitflow).
Deploy to live environment - TPI.
Use permanent data sets.
Each micro service (and gateway) will have accompanying QA framework.
Upon service deploy execute it.
If feature is on multiple microservices it will be on developers to sequence the testing.
35.
36.
37. The goal
37
Start automation on feature completion (code pushed to repository)
Run
Unit
Tests
Do Static
Code
Analysis
Build
& Save
Package
Deploy
Check
Service
Status
Run
Integration
Tests
Send
Reports
38. Deploy validation
38
Via healthchecks.
Internally exposed HTTP endpoints that provide
summary of dependencies’ and internal statuses.
Every product must implement this TM standard.
Response must be quick.
Healthcheck status is composed by background job.
Healthchecks are used in monitoring,
and by load-balancers.
Rundeck/Jenkins will fail job if healthcheck is negative.
41. CD vs CD
41
Continuous Delivery is about keeping your application in a state
where it is always able to deploy into production.
Continuous Deployment is actually deploying every change into production,
every day or more frequently.
- Martin Fowler
42. CD vs CD
42
Why not all the way to Production?
We (API) are only the half-product. - Vanja Radaković (Product Manager)
Even if all tests on API pass that doesn’t mean no functionality is broken on our clients.
We “sit” a week in Stage env, for sign-off from major clients,
between when release is ready and actually deployed to Production.
DEVs TPI Production(s)QAs Stage
43. Automating the security
43
Veracode is platform for application security scanning.
Commercial. Available only as SaaS.
We have added a branch that (via GitLab and Jenkins)
automatically uploads artifacts to Veracode.
Due to long-lasting scan this is not included
in regular flow on feature completion.
There are company-wide defined policies.
We are reviewing status once per Sprint/Release.
44.
45. Performance testing (WiP)
45
Running on dedicated environment.
Same topology (num. of servers) and data size as in production.
Our production data is imported on demand.
All of backend dependencies are mocked due to difficulties to provision data.
TPI we use for functional testing contains inconsistent and not-big-enough data.
Mocks are based on or logs from production.
API mocking tool - WireMock.
What if in need to mock something other than HTTP API, like storage?
Rethink your architecture.
46. Tool - Gatling
46
Load testing framework.
Open source.
Supports code written in Scala or Java.
Can be executed from command line.
Easy to integrate with Jenkins using the official plugin.
47.
48.
49. Performance testing ideas
49
Automate in a way similar to security scanning - new branch.
Jenkins to build.
Rundeck to deploy.
Gatling to execute tests.
Bonus:
Attach APM tooling that would provide insights during testing.
Currently evaluating New Relic and Ruxit.
50. Logging
50
Company standards to separate logs:
• application log
• payload log (inbound/outbound)
• performance log
Only application logs are indexed.
Others are available on servers for N days (depending on retention policy).
Unique “Correlation ID” that allows tracking of requests
through multiple services and all type of log files.
51. Tool - Splunk
51
Platform for operational intelligence.
Much more than log aggregation (searching, monitoring and vizualization).
On premise or SaaS.
Free and Commercial editions.
Our dashboards: relationships between HTTP errors (not application errors) and clients.
Our alerting: on detected deviation/increase in volume of errors.
52.
53.
54. Benefits we (Dev team) got
54
Less thinking for developers.
Quicker test and feedback cycles.
Automation on “feature completion”.1
Feeling very comfortable during production deploys.
Using same tools for all environments2
Being able to react quickly. Or even do preemptive actions.
Visibility of changes and metrics3
No need to “reinvent the wheel”.
Shared knowledge. Contributing to solutions.
Company initiatives as guidance4
55. Feel free to contact us:
office@bakson.rs
Thanks for listening!
Notes de l'éditeur
Image downloaded from http://www.unsplash.com
Image is in Public Domain, so can be used for commercial purposes
Image downloaded from http://www.unsplash.com
Image is in Public Domain, so can be used for commercial purposes