Applications built over the years carry historical design assumptions, such as: it is acceptable to take a system out for upgrade maintenance for a few hours every 6 months.
In today’s world, embracing continuous delivery practices means more frequent releases, which means more downtime. Besides, finding a good maintenance window becomes a struggle with worldwide users, as well as for the operators managing the upgrade out of business hours.
In this talk, I demonstrate that by mapping out complex deployments processes, it becomes possible to prioritise work and progressively reduce the deployment impact. I will also give practical advice on how to tackle blockers to zero-downtime deployments, such as:
Migrating database schemas while keeping an application running Ensuring backward compatibility of messages and APIs Dealing with long-running background jobs Mitigating user session loss Deploying without the comfort of a maintenance window also means that stability during the upgrade is a critical concern. I will go through how it can be achieved through systematic pipeline automation and good system visibility to help operators during the upgrade.
The trick is: zero-downtime doesn’t mean everything is up or running the latest version, it only means nobody notices!
4. @PierreVincent
There has been a massive earthquake
in New Zealand and I need to use
Poppulo for regular updates.
Please can you advise when it will be
back online.
“
”– Poppulo customer
5. @PierreVincent
2009 2015
Deploying every 3 to 6
months
4 hours downtime
On Sunday at 5PM
Deploying every 4 weeks
2 hours downtime
On Sunday at 8PM
2018
How do we go
faster without
impacting users?
10. @PierreVincent
They simply mean users don’t notice a thing
while all this is happening.
Zero-downtime deployments don’t mean
everything stays up or that everything is
immediately running the latest version.
11. @PierreVincent
Run database migrations
Enable maintenance mode
Shut down services
Upgrade services
Start services
Disable maintenance mode
Wait for queued jobs to complete
15-60 mins
5-30 mins
15 mins
User impact
Limited functionality
Downtime
Wait for services startup
Deployment steps
13. @PierreVincent
Use expand/contract to split
breaking changes
Application [N] must work
with schema [N+1]
Online database
migration
Decouple schema version
from application version
No destructive operations to
tables/columns in use
Ensure backward
compatibility with
non-breaking changes only
Detect changes likely to cause
locking problems
Limit impact to live traffic
15. @PierreVincent
More on schema migrations
Baron Schwartz - DevOps for the database
Chapter: Loosening the Application/Database coupling
www.vividcortex.com/resources/devops-for-the-database-ebook
Michiel Rook - Database Schema Migrations with Zero Downtime
speakerdeck.com/mrook/database-schema-migrations-with-zero-downtim
e-continuous-lifecycle-london-2019
17. @PierreVincent
Drain
Stop
Upgrade
Start
Up [N]
Up [N+1]
1 2
Drain
Stop
Upgrade
Start
Up [N]
Up [N+1]
Featuredowntime
Drain
Stop
Upgrade
Start
Up [N]
Up [N+1]
1 2
Drain
Stop
Upgrade
Start
Up [N]
Up [N+1]
Featurecontinuouslyavailable
Full upgrade Rolling upgrade
22. @PierreVincent
Limiting risk throughout the transition
Sunday night
Deploy
Time
Monday 8am
On-demand
Live
Traffic
None
Limited
Full
Customer
Notice
Planned 3h maintenance
for upgrade
(7 days email notice)
Planned maintenance
with no expected impact
(in-app message)
None
System
Operations
Deployment
Ownership
Dev
✓ ✓ ✓ ✓
Oct
2018
Jan
2019
23. @PierreVincent
Zero-downtime deployments don’t mean
everything stays up or that everything is
immediately running the latest version.
Thank you!
@PierreVincent
pvincent.io
They simply mean users don’t notice a thing
while all this is happening.