2. 2
…
PDP-7
~1/2M USD (2018 equivalent) video console !
In the 1960s, while working on the Multics Operating System at Bell labs, Ken Thompson created the video
game “Space Travel”.
Bell Labs withdrew from the MULTICS project. In order to go on playing the game, Thompson found an old
PDP-7 machine and rewrote Space Travel on it.
The tools created to make Space Travel later became the Unix operating system.
Space Travel
Source: Wikipedia
4. 4
1995: Millions of $$$
SGI Challenge Array
1990: 50k US$ HP Workstation 1985: 32M USD Cray-2
with liquid cooling
• Many environments in 80’s and 90’s already heavily automated.
• Start of a shift from SHELL based automation to languages like Perl
• HW often had many functions to support automation (maybe sometime better than todays x86 HW).
• Tens to thousands of users on a single server/OS.
Upgrade/migration very complex.
Setup evolved over several years:
• very hard to clean up when upgrading/removing/rolling back software
• impossible to reproduce environments on a new server
Extremely high price for HW defined many operational patterns!
5. 5
Shared HW in labs with multi user support – How to achieve consistency?
Consistency across many servers/environments could be achieved in many ways
- Scripts managing groups of individual servers
- Diskless nodes (one filesystem)
- Distributed file systems allowed managing many computers “as one”
Example: HP DUX which allowed “one filesystem” even in a mixed HW environment
• Context dependent files allowed, for instance, different files per CPU type at the same location
• Distributed device nodes and distributed named pipes (cool stuff!)
However, tight coupling made the distributed filesystem a risk for the whole cluster
Only admin users could change system/application files.
Friction between “root” and the users gave birth to stories like
BOFH – The Bastard Operator From Hell!
7. 7
…
Desktop:
• Each user could get his own machine with local disk
• Users got more control of their own environment (there was a lack of mechanisms to avoid it!)
• Friction towards “root” dramatically reduced
Servers was in many ways a step backwards:
• Most 1990-2000 x86 HW had less automation support than Unix servers 10 years earlier
• Users paid “their own server” but many unix security aspects depended on control of root
-> friction between server ”owner” and operator
• More HW variations, more OS variations, less consistency in tools across HW and SW
-> admin workload per server increased (but HW cost savings made this ok…)
• By 2000, some companies started getting very good at automating x86 as well, but at significant dev cost
• However… cost/performance got significantly better!
8. 8
Traditional process to get new servers:
1. Meeting to share request
2. 2-5 meetings to discuss architecture
3. Budget, order, deliver, rack & cable
4. OS setup
5. Typically 3 to 6 months
6. Surprisingly often… servers got into next budget year….
9. 9
- High lead time for new HW
- Large number of custom HW variation caused big administrative overhead and extra cost
- “Ownership of HW” (HW Hugging) created mental barrier blocking efficient HW use
- Some core Unix dependencies such as NFS prevented “delegating root”
- Many technical solutions (NFS is again an example) also prevented efficient scaling and failure handling
often needed in Internet services
11. 11
Last 3 years :
Efficiency per clock +10%
Increase in Cores + 50%
0
5
10
15
20
25
30
0
500
1000
1500
2000
2500
3000
3500
4000
CoreCount
ClockFrequency
Clock Frequency Core Count Clock Efficiency
Increase in CPU cores main contributor to CPU since 2004
Time to virtualize!
12. 12
“Of the AWS instances monitored by 2nd Watch, 38 percent are small. There's a
significant gap between the next most numerous instance size, which is medium
(making up 19 percent of total instances monitored).”
Source: 2014 report from AWS management system provider 2nd Watch:
How powerful is a “small” instance?
System Multi-Core
Geekbench result
Dhrystone
EC2 ”t2.small” 2,822 35,003,920
iPhone X 10,641
Huawei P20 Pro 6753 100,945,863
13. 13
VM
Virtualization successfully decoupled HW from service!
• No more server hugging!
• Easy to keep “spare resources” for immediate deploy from VM image.
• Could use standard HW regardless of VM config. Dramatically simplified admins life.
• Provisioning time down from months to days or even immediately.
• Commercial products like vSphere made things “easy”
Server ServerServer
Hypervisor
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM VM
VM
VM
VM
VM
VM
VM
VM
VM VM
VM
14. 14
Huge shift!
• 1 mid-high spec 2010 server could quickly turn into 30-50 smaller VMs
• From many services per server -> 1 service per VM
• Many new tools on the scene to automate and track.
• Make image -> Clone! Clooone! Cloooooooooone!
• Predictable cloning, however maintaining a predictable and reproducible state when number of hosts
grew from 1 physical to 30-50 VMs did not get easier.
15. 15
• Endless resources! (Budget?? What’s that??)
• Allowed dev teams to bypass server admins. Great organizational hack but terrible for security..
• Good APIs for automated generation of VMs
• Not cheap, but in some way this benefits the user.
Many companies adopted automation just to be cost efficient through dynamic scaling.
16. 16
Public cloud and internal virtualization allowed (and even forced) the creation of DevOps
- DevOps is an approach rather than a distinct role (a lot of confusion and disagreement here…)
- The core of DevOps describes Toolchains for automation
- Code -> Build -> Test -> Package -> Release -> Configure -> Monitor (and repeat from start!)
- All parts of the toolchain should be automated a much as possible.
Most of friction between infrastructure and development literally gets automated away…
17. 17
Predictable environments
- Human change to environments cannot be allowed (humans are not predictable!)
- Automate tests as much as possible, including performance and failure test.
- All deploy operations should be automated
- Always build from a predictable baseline. Never change after build. Build again instead
- Immutable and disposable infrastructure!
- Build environment once, deploy many
Build Deploy Run Destroy
Deploy Run Destroy
Deploy Run Destroy
Development
QA
Production * X
Test
18. 18
Official Build
Unit test
Packaging
Private Branch Private Build Unit Test Merge to official
Branch
Deploy to CI Integration tests
Deploy to QA Integration tests Manual tests
Smoke tests Release
Provision CI
Environment
Provision QA
Environment
Provision Prod
Environment
When successfully done – Can often drive 3-10x increases in dev output
19. 19
Great abstraction for immutable systems!
- Can layer filesystems to dramatically speed up build, test and deploy of environments
- Easier, faster and less error prone to distribute than installing “thousands” of packages
- Very light weight (vs VMs)
- Minimal performance overhead
Host OS
Guest OS
Libraries
Application
Guest OS
Libraries
Application
Server
Host OS
Container control layer
Libraries
Application
Libraries
Application
Server
Hypervisor
Typical VM Setup Typical Container Setup
21. 21
In the past:
- Rough plan
- 4-10 Architecture meetings
- Cost Calculation
- If cost<Budget
- Approve HW cost
- Wait 6 months
- Launch service
- If cost>Budget
- Wait for next year
- Hope you did not forget the
whole plan by then…
Maybe 1-4 updates of dev/stg
per month
Today, compute is just a dependency:
(Imaginary deployment profile)
Instances: 100
Memory: 4G
Cores: 4
OS: Centos 7.4
Container image:
myapp: 5.24
>deploy myapp.profile
Wait 2-3 minutes and app runs on 100
containers
Can re-create dev/stg/test many times a
day.
22. 22
…
Facility
Network
Server
OS
Application Ugly Line of
Organizational
Friction!
BOFH
Clueless Developer
One of the biggest benefits of devops style automation is reduction in
friction between dev and infrastructure teams!
Friction in this case defined as “actions you have to take only to be able
to proceed but adds no value in terms of improving quality”
Developers
Cloud Platform Team
25. 25
Typical Internet service
- 1-2 releases per quarter: you may be able to maintain service quality
- > 3 releases per quarter: you may be improving service quality
- Fastest improving service parts in Rakuten: 20-30 A/B tests per quarter.
~50% failure ratios on tests normal
26. 26
Service A
Modern automation often allow developers to move faster than planning and decision making.
20 A/B tests per quarter = ~ 3 days for each test if sequentially done (Plan/DEV/QA/AB/Release)
How do you structure your teams for such faster processes?
PDM/PJM
DEV
QA
Infra
PDM/PJM
DEV
QA
Infra
B CA
PDM/PJM
DEV
QA
Infra
BA
Infra
DEV
DEV
Cross-Functional? Matrix? Broken Tetris structure
Something else?
27. 27
Traditional style.
VMs with semi automated deploy and QA
2-3 A/B tests per quarter. Flat for 2 years.
Change to containers. Full CI pipeline with more than
1000 testcases and automated deployment.
More than 20 A/B tests per quarter.
New cross functional team structure, same team size.
CVR data for one of Rakutens major site parts.
28. 28
Speed vs stability and risk. Need to keep balance!
Storage need to be handled different from stateless applications
Lower layers needs to provide flexibility and self service so higher layers can RUN by themselves!
StorageStorage
APIAPI
Control/AdminControl/Admin
App 1App 1 App 2App 2 App 3App 3
. . .
Slow Moving
Fast Moving
HWHW
FacilityFacility
29. 29
1/2M USD video
console turned
into Unix
1969 1980-1995
Multi User / Multi
service very
expensive Machines
1992-2000
Move starts towards
dedicated purpose
machines. Still 1
machine many services
2004-2010
Virtualization starts:
1 service process per
VM getting normal
Devops describes an
approach to make safer
and faster operation at
scale
Containers make operations
at scale faster than ever.
Application vs. Infrastructure
containers concepts
Public cloud takes off
and makes VMs easier
than ever!
2008-2012 2010 - 2012 2015 -> 2018 ->
Near future likely to bring
- Faster processes
- Include new things like
networking in the automation