Handwritten Text Recognition for manuscripts and early printed texts
The Why and How of Continuous Delivery
1. The Why and How of Continuous Delivery Nigel McNie getyourgameon.co.nz
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
Notes de l'éditeur
Slight detour - the "what" of CD. It's a strategic decision about "ops" (business operations). You attain the goal of fast development of high quality software through build/test/deployment automation. Many people get as far as build/test automation, so we're really just going one step further. It's a strategy - no prescribed way to do it, many tactics you can use. We'll visit some of them later. But before then, I should justify "why". Why deployment automation?
To understand why, we need to look at the "development landscape". The business needs a stable, working product to function. Furthermore, product dev does not happen in a vacuum. From time to time, there are legitimate business reasons that something needs to be done by a certain time (e.g. conference, or "marketing need a date to target"). Team instability - start with just one, can grow to hundreds, staff stick around for a few years, you'll make bad hires. Security vulnerabilities discovered, urgent/important bugs found, need to respond to competition, want to seize opportunities, and even general expectations of customers. Debt accrues over time, and needs to be reduced whenever possible, lest it kills the product entirely.
I was going to do a slide on other deployment strategies, but unlike development strategies (waterfall/agile/extreme etc.), "deployment strategies" is barely a recognised term, let alone a researched one. Waterfall - the "when it's done" release methodology, which revolves around the (delusional) idea that "releases have 0 bugs". Agile - releases every two weeks or so, likely to be a reasonably well thought out and somewhat automated process. BAU - "weekly" is pretty standard here, again the process is generally "stable". Hotfixing/firefighting - required on every project. Some projects have a separate path to production for this. Marketing Release - "The new features go live on this day - we took out ads". 5am release - or indeed any time after hours - done because your releases need scheduled downtime. Friday Afternoon release - phenomenally stupid idea :) Scheduled Downtime - many projects need this to do releases (restarting services, applying upgrades etc). Unexpected Downtime - when releasing goes wrong :) Merge Hell - trying to get features merged before a release, and the inevitable conflicts this entails. String Freeze - the language team's attempt to get some kind of stability before a release. QA Approval - some team of humans between the devs and production. Management signoff - some team of people ill-qualified to make development decisions between the devs and production.
So what's wrong with most strategies today? They ignore realities of development - for example that vulnerabilities don't care what your release schedule is. They prevent any possibility of instant gratification for a customer, no matter how easy the fix is. Worse, when something really needs to be fixed, process will often be subverted. "We are losing millions" does not get "fixed in the next release". It certainly doesn't get proper considered signoff. It gets hacked, often in production, to the protest of sysadmins, and in some cases in violation of legislation. In effect, your "stable" release process contributes to you being out of control when you need control most. Some get around this by having a separate "hotfix" path to production. This feels like a hack to me. Firstly, if you're not running your test suite in that path, it's instantly dangerous. And if you rely on QA to check things, it's even more dangerous, because it's easy to skip QA entirely. They have other insidious effects. The more you delay, the more inefficiency you introduce. When people branch, they do it in the name of stability (so others won't have to deal with a broken codebase), but all they're really doing is kicking the problem to touch. Merge hell is a stability problem on its own. Worse, people are discouraged from doing refactoring that pays down technical debt, because they know that incoming branches will cause horrible merge hell. There's also the problem of wasted code. The longer it takes to release, the more likely you'll develop code that's never used/fix bugs that were never a problem/waste brain cycles on something irrelevant. This is Waste, to be eliminated.
I think these problems can be summed up in two observations. One: releases are a source of friction. They could be low friction, or high. As the friction grows larger, you will struggle to react, have more stability problems, build more technical debt, and so on.
Two: releases are feared. They straddle the boundaries of dev and ops, they're a point of high risk of spectacular failure. The less you confront this problem, the scarier they'll seem, the harder they'll be, the slower you'll go... Slower is an important point. At the point where teams have problems with the release process, they often make the decision to go slower. Against the development landscape, and with the risk of waste increasing, this is a flawed strategy.
I've outlined a raft of problems with the way releases are done currently. Now it's time to look at why CD is worth doing. Here's a reminder of the development landscape. These are the requirements we are trying to fulfill. Let's have a look at the CD strategy, and see how it fits.
CD is centred around the idea of being able to deploy to production at any time; reducing the release friction. So already, assuming you can implement it, you can see how in theory it will help with the challenges issued by the development landscape. At a high level, there's only two requirements that need to be fulfilled for this to be a reality. 1. As much of the deployment process needs to be automated as possible. It pretty much has to be at the level of one command, one button push or suchlike. Having said that, I personally don't - this is simply due the dev team just being me. It's a two command deploy. 2. Given the last requirement is optional in certain cases, the only hard rule of CD (that I've come across) is that one branch should always be deployable. If it's going to be possible to deploy at any time, something always has to be deployable, right? The general mechanics of how CD works in practice is: A commit makes it to mainline, this triggers an automated test run (CI), and if the run passes, the code is deployed straight to production. Let's look at this in more detail.
As all commits need to be deployable, it helps to keep them small. This is something you should be doing already where possible. If you're using a newer VCS, pushing a small series of commits is fine too. This process applies to all commits - from small bug fixes up to major feature development. How do you hide new feature developments from end users? There's a couple of ways. One is by using feature branches and another is by using feature flags/flippers.
Feature branches are a well understood concept, and an easy way to hide code that isn't ready yet. The trouble is, they make you vulnerable to the Merge Hell we talked about previously. Recapping: merging with master can cause conflicts (even if the patch applies OK), and feature branches can prevent people from refactoring. This problem can be managed -and the Github team do things this way just fine. People merge master into their branches often - they're responsible for maintaining the branch so they are responsible for any merge headaches, and should have them resolved before they merge back to master.
Feature flags are essentially implementing feature branches in code. Think of it as an extra subsystem in your code, like the logging subsystem. A simple implementation would be a config file which says what features are enabled, and if statements in the code. It can be as complex as you like. Flickr built a dashboard to manage their feature flags. And it can give you dark launch, split testing, seperation of marketing release from deployment, and built-in kill switches.
Our goal is that a commit triggers a deployment. But you don't deploy without testing, so a commit triggers a test run first. This is Continuous Integration - not new. The tests should run quickly because we want to deploy once they're done. To put this in perspective, IMVU deploys up to 60 times a day at last count. The tests take 9 minutes, code push takes 6 minutes, and they pipeline, so they can deploy every 9 minutes. Amazon deploys every 11.6 seconds, with a peak of 1,079 deploys/hour. That's how fast your tests have to run if you want this to work. Mechanics: CI server, or hook into your VCS, or just do it manually. Naturally, test failure prevents deployment. Flickr goes one step further - commit is reverted.
Once the tests have passed, a deployment should happen - preferably, triggered automatically. Should be as automatic as possible, and should cause as little downtime as possible. Scripting languages make this easier, but even in others, you can use various tricks to ensure 0 downtime. People should not be afraid to do a deployment. It should be so simple that even a manager could do a deployment! Or at least, a designer/copywriter. This is a key point: You should be happy that, even under conditions of high load or intense public interest, anyone in the dev team can trigger a deployment. Thought experiment: let's pretend a release has no changes. Could your most junior developer do a deployment safely? This can go to extremes for large sites: "Cluster Immune" systems.
It should always be possible to roll back to the previous commit. Made easier because you are making small changes. Database changes: make forward/back patch, test before committing. Data munging: you can keep old data/columns around, even for weeks after the initial change. Write to both locations. Then later, you can drop the old stuff. If it's minor/tolerable/not worth rolling back for/just spotted: write a test and deploy again! You can see how problems will be fixed much faster than "waiting for the next deployment". Key assumption of CD: in contrast to "0 defects", you have "every defect once". However, you're also contracting to write tests to cover breaks. (You can be lazy, sometimes this is useful). This means your test suite contains basic sanity tests, and tests for things that actually break - a powerful combination. If you think you can prevent failure with "0 defects", you're not developing your ability to respond when you get one. Better to expect and embrace failure, and develop systems to limit its effects, than run from the idea and be bitten worse.
I hope you can see by now how CD is actually a strategy for reducing risk. Smaller changes are less risky, and you're always running the tests, which will only get better as time goes on. If you think it's risky, you may be assuming that what you do now doesn't have the same risk! So before pointing out a risk, consider if you don't already have the problem. "Avalanche model" - would you rather have many small avalanches, or a few large ones? Slower release cycles are risky! Negative feedback loop: "it's so risky we only do it four times a year", and corresponding positive feedback loop: the more you do it, the easier and safer they become.
Maybe you can see why CD isn't insane - now let's have a look at why it's better. It improves your responsiveness. Urgent/important bugs can be fixed right away - you'll look good with your customers. No gatekeeper - devs can work at their own schedule. Breaks down the barriers between devs and customers. When production breaks days or weeks after the code is written, this is bad! You have to break flow to investigate, and since it was a while ago, it's harder to work out what is wrong. CD improves your testing habits by giving you a good incentive to write them. Tests are no longer "possibly" useful - they actually "are" useful as they're in the way! Less pressure: separating marketing schedule from release schedule is a win all by itself. You already deployed it, tests already pass, you know it will work. No need to schedule deployments - no need for being at work after hours, and less need to involve ops (some release processes require ops, and some require ops when they break). No need to worry about "missing the next release". There's always a next release! There's less waste. People have to deal with merge conflicts immediately, because they can't check in broken stuff. And you're able to run quick experiments to see if a feature is really wanted. Improved dev attitude: Developers will take an interest in a feature beyond checkin, because they can push it live instantly. Then they can see how users interact with the feature. It's a death blow to the "it works for me" attitude.
CD lets you go fast with confidence. This ends the Cowboys vs Astronauts fight.
Financial Institution: "Every defect once might be sensible for you, but..." If you started with CD, and had years of tests (as you should do anyway), wouldn't you be a little more confident? And if not, I assert that this is the fault of your test suite. Outsourcing QA: You can still involve QA: feature flags, or simultaneous staging deploy (where flags are turned on). QA people can test on production. What harm could they do that you couldn't recover from? What damage could they do that you wouldn't want to know about? Batching changes is safer: It is not true that waiting longer means you're more likely to find bugs. The best way to find bugs is with other eyes. You can do this without batching. Even the threat of more eyes can improve quality: psychological effect of knowing your changes are going straight to production. Batching means bigger changes, which mean more risk.
"We'd never get signoff for this": Well, "we're losing millions" never goes through signoff either. Get management to sign off on the process, perhaps conceed "freeze periods", they'll soon come around. "Legislation forbids us from..." Devs hack stuff on production all the time! In violation of legislation! Some dev always has root access. They get it during a disaster, and keep it until passwords are changed. Or it's sanctioned. Wouldn't it be better if your firefighting process was the same as your normal one? "GST goes up to 15% on XXX..." As in, we have times when we need to run certain code before and different code after. Use a time based feature flag! Related: "we have an external dependency". Your test system should be mocking this. Related: "external system changes on...". Time based feature flag. "Our tests now take hours to run!": Yes, your tests will grow large. You need to keep the runtime down, so parallelise. IMVU: 4.5 wallclock hours of tests run in 9 minutes. "That's all good for you SaaS guys, but..." Google chrome - great example of a client app doing something like this. Moodle: download version "2.1+". IOS/Testflightapp
This concept is only a few years old. Has origins with Toyota's Manufacturing, and popularised by Eric Ries/IMVU. Already heavily adopted by some of the smartest companies in tech, and people are working on how to get it into the enterprise/hardware development, etc. Major traction (although no wikipedia page yet) Flickr/Amazon/Etsy/Netflix are examples - Amazon deploys once every 11.6 seconds, Etsy 25 times/day and reported it improved morale.