Snap CI enables software teams to do Continuous Delivery (CD). When practicing CD, the goal is to automate the deployment process and build software in such a way that it can be deployed to production any time. As a deployment tool, Snap CI cannot have downtime. If it did, our users would not be able to deploy their own software. We had to change Snap CI’s architecture to ensure zero-downtime and we chose to do blue-green deployments to achieve it. In this approach, we had to maintain two instances of our system: one active instance, and one inactive instance. Based on our experiences, we will share some tricks of the trade from the numerous challenges we faced such as: making the application aware of whether it was active or inactive, handling data migrations, and babysitting long-running jobs.
These are the slides from Akshay Karle and Fernando Junior's presentation on Agile Brazil 2015.
26. OUR DEPLOYMENT
▫︎ Do have automated scripts
▫︎ Deployment pipeline
▫︎ 1-click deploy
▫︎ Sort of…
Babysitters
DATABASE
L
B
Build Server
Database
web server
27. OUR DEPLOYMENT
▫︎ Wait for all builds to finish
▫︎ Put app on maintenance
mode
▫︎ No new requests picked up
▫︎ Deploy and wait for
migrations
Babysitters
DATABASE
L
B
Database
VZHOSTBuild Server
web server
37. CHALLENGES IN SNAP
Long running builds
Running multiple versions of your app at the
same time
Database migrations
38. FEATURE TOGGLES
Hide unfinished UI elements
Control backend behaviour
Test with feature toggles
Avoid multi-component feature toggles
http://martinfowler.com/bliki/FeatureToggle.html
41. TESTING WITH FEATURE TOGGLES
describe "multiple jobs" do
describe "feature enabled" do
before(:each) do
with_feature_enabled(:parallel_stage)
end
it 'should not show the job tabs when there is only one job' do
end
end
describe "feature disabled" do
before(:each) do
with_feature_disabled(:parallel_stage)
end
it 'should not show the job tabs but should show the logs' do
end
end
end
42. CHALLENGES IN SNAP
Long running builds
Running multiple versions of your app at the
same time
Database migrations
46. POPULATE ALL EXISTING STAGES
stages
id
started_at
completed_at
result
…
jobs
id
stage_id
…
47. COPY ATTRIBUTES TO JOB
stages
…
started_at
completed_at
result
…
jobs
…
started_at
completed_at
result
…
48. SWITCH THE APPLICATION TO START USING
THE JOB MODEL
# After switch
class Stage
def result
results = jobs.collect { |job| job.result }
return :failed if results.any?(:failed)
:passed
end
end
# Before switch
class Stage
attr_reader :result
end
# in transition
class Stage
def result
if feature_enabled?(:parallel_stage)
results = jobs.collect { |job| job.result }
return :failed if results.any?(:failed)
:passed
else
result
end
end
end
51. LESSONS LEARNT
Zero downtime is not easy
Having reliability, automation, frequent releases helped
Watch out your data
Things will go wrong, but we keep learning and keep improving