My presentation at the Docker meetup in Seattle on January 28, 2015. Video available at https://www.joyent.com/developers/videos/docker-and-the-future-of-containers-in-production
1. Docker and the
Future of Containers in
Production
CTO
bryan@joyent.com
Bryan Cantrill
@bcantrill
2. Prehistory: Virtualization as cloud catalyst
• In the 1960s — shortly after the dawn of computing! —
pundits foresaw a compute utility that would be public
and multi-tenant
• The vision was four decades too early: it took the
internet + commodity computing + virtualization to yield
cloud computing
• Virtualization is the essential ingredient for multi-tenant
operation — but where in the stack to virtualize?
• Choices around virtualization capture tensions between
elasticity, tenancy, and performance
• tl;dr: Virtualization choices drive economic tradeoffs
3. • The historical answer — since the 1960s — has been to
virtualize at the level of the hardware:
• A virtual machine is presented upon which each
tenant runs an operating system of their choosing
• There are as many operating systems as tenants
• The singular advantage of hardware virtualization: it can
run entire legacy stacks unmodified
• However, hardware virtualization exacts a heavy price:
operating systems are not designed to share resources
like DRAM, CPU, I/O devices or the network
• Hardware virtualization limits tenancy, elasticity and
performance
Hardware-level virtualization?
4. • Virtualizing at the application platform layer addresses
the tenancy challenges of hardware virtualization
• Added advantage of a much more nimble (& developer-
friendly!) abstraction…
• ...but at the cost of dictating abstraction to the developer
• This creates the “Google App Engine problem”:
developers are in a straightjacket where toy programs
are easy — but sophisticated apps are impossible
• Virtualizing at the application platform layer poses many
other challenges with respect to security, containment
and scalability
Platform-level virtualization?
5. • Virtualizing at the OS level hits the sweet spot:
• Single OS (i.e., single kernel) allows for efficient use of
hardware resources, maximizing tenancy and
performance
• Disjoint instances are securely compartmentalized by
the operating system
• Gives users what appears to be a virtual machine (albeit
a very fast one) on which to run higher-level software
• The ease of a PaaS with the generality of IaaS
• Model was pioneered by FreeBSD jails and taken to
their logical extreme by Solaris zones — and then aped
by Linux containers
OS-level virtualization!
6. OS-level virtualization in the cloud
• Joyent runs OS containers in the cloud via SmartOS
(our illumos derivative) — and we have run containers in
multi-tenant production since ~2006
• Core SmartOS facilities are container-aware and
optimized: Zones, ZFS, DTrace, Crossbow, SMF, etc.
• SmartOS also supports hardware-level virtualization —
but we have long advocated OS-level virtualization for
new build out
• We emphasized their operational characteristics
(performance, elasticity, tenancy), and for many years
we were a lone voice...
7. Containers as PaaS foundation?
• Some saw the power of OS containers to facilitate up-
stack platform-as-a-service abstractions
• For example, dotCloud — a platform-as-a-service
provider — build their PaaS on OS containers
• Hearing that many were interested in their container
orchestration layer (but not their PaaS), dotCloud open
sourced their container-based orchestration layer...
9. Docker revolution
• Docker has used the rapid provisioning + shared
underlying filesystem of containers to allow developers
to think operationally
• Developers can encode dependencies and deployment
practices into an image
• Images can be layered, allowing for swift development
• Images can be quickly deployed — and re-deployed
• Docker will do to apt what apt did to tar
10. Docker’s challenges
• The Docker model is the future of containers
• Docker’s challenges are largely around production
deployment: security, network virtualization, persistence
• Security concerns are real enough that for multi-tenancy,
OS containers are currently running in hardware VMs (!!)
• SmartOS, we have spent a decade addressing these
concerns — and are proven in production…
• Could we combine the best of both worlds?
• Could we somehow deploy Docker containers as
SmartOS zones?
11. Docker + SmartOS: Linux binaries?
• First (obvious) problem: while it has been designed to
be cross-platform, Docker is Linux-centric
• While Docker could be ported, the encyclopedia of
Docker images will likely forever remain Linux binaries
• SmartOS is Unix — but it isn’t Linux…
• Could we somehow natively emulate Linux — and run
Linux binaries directly on the SmartOS kernel?
12. OS emulation: An old idea
• Operating systems have long employed system call
emulation to allow binaries from one operating system
run on another on the same instruction set architecture
• Combines the binary footprint of the emulated system
with the operational advantages of the emulating system
• Sun first did this with SunOS 4.x binaries on Solaris 2.x
• In mid-2000s, Sun developed zone-based OS emulation
for Solaris: branded zones
• Several brands were developed — notably including an
LX brand that allowed for Linux emulation
13. LX-branded zones: Life and death
• The LX-branded zone worked for RHEL 3 (!): glibc 2.3.2
+ Linux 2.4
• Remarkable amount of work was done to handle device
pathing, signal handling, /proc — and arcana like TTY
ioctls, ptrace, etc.
• Worked for a surprising number of binaries!
• But support was only for 2.4 kernels and only for 32-bit;
2.6 + 64-bit appeared daunting…
• Support was ripped out of the system on June 11, 2010
• Fortunately, this was after the system was open sourced
in June 2005 — and the source was out there...
14. LX-branded zones: Resurrection!
• In January 2014, David Mackay, an illumos community
member, announced that he was able to resurrect the
LX brand —and that it appeared to work!
Linked below is a webrev which restores LX branded zones
support to Illumos:
http://cr.illumos.org/~webrev/DavidJX8P/lx-zones-restoration/
I have been running OpenIndiana, using it daily on my
workstation for over a month with the above webrev applied to
the illumos-gate and built by myself.
It would definitely raise interest in Illumos. Indeed, I have
seen many people who are extremely interested in LX zones.
The LX zones code is minimally invasive on Illumos itself, and
is mostly segregated out.
I hope you find this of interest.
15. LX-branded zones: Revival
• Encouraged that the LX-branded work was salvageable,
Joyent engineer Jerry Jelinek reintegrated the LX brand
into SmartOS on March 20, 2014...
• ...and started the (substantial) work to modernize it
• Guiding principles for LX-branded zone work:
• Do it all in the open
• Do it all on SmartOS master (illumos-joyent)
• Add base illumos facilities wherever possible
• Aim to upstream to illumos when we’re done
16. LX-branded zones: Progress
• Working assiduously over the course of 2014, progress
was difficult but steady:
• Ubuntu 10.04 booted in April
• Ubuntu 12.04 booted in May
• Ubuntu 14.04 booted in July
• 64-bit Ubuntu 14.04 booted in October (!)
• Going into 2015, it was becoming increasingly difficult to
find Linux software that didn’t work...
19. Docker + SmartOS: Provisioning?
• With the binary problem being tackled, focus turned to
the mechanics of integrating Docker with the SmartOS
facilities for provisioning
• Provisioning a SmartOS zone operates via the global
zone that represents the control plane of the machine
• docker is a single binary that functions as both client
and server — and with too much surface area to run in
the global zone, especially for a public cloud
• docker has also embedded Go- and Linux-isms that
we did not want in the global zone; we needed to find a
different approach...
20. Docker Remote API
• While docker is a single binary that can run on the
client or the server, it does not run in both at once…
• docker (the client) communicates with docker (the
server) via the Docker Remote API
• The Docker Remote API is expressive, modern and
robust (i.e. versioned), allowing for docker to
communicate with Docker backends that aren’t docker
• The clear approach was therefore to implement a
Docker Remote API endpoint for SmartDataCenter
21. Aside: SmartDataCenter
• Orchestration software for SmartOS-based clouds
• Unlike other cloud stacks, not designed to run arbitrary
hypervisors, sell legacy hardware or get 160 companies
to agree on something
• SmartDataCenter is designed to leverage the SmartOS
differentiators: ZFS, DTrace and (esp.) zones
• Runs both the Joyent Public Cloud and business-critical
on-premises clouds at well-known brands
• Born proprietary — but made entirely open source on
November 6, 2014: http://github.com/joyent/sdc
24. SmartDataCenter + Docker
• Implementing an SDC-wide endpoint for the Docker
remote API allows us to build in terms of our established
core services: UFDS, CNAPI, VMAPI, Image API, etc.
• Has the welcome side-effect of virtualizing the notion of
Docker host machine: Docker containers can be placed
anywhere within the data center
• From a developer perspective, one less thing to manage
• From an operations perspective, allows for a flexible
layer of management and control: Docker API endpoints
become a potential administrative nexus
• As such, virtualizing the Docker host is somewhat
analogous to the way ZFS virtualized the filesystem...
25. SmartDataCenter + Docker: Challenges
• Some Docker constructs have (implicitly) encoded co-
locality of Docker containers on a physical machine
• Some of these constructs (e.g., --volumes-from) we
will discourage but accommodate by co-scheduling
• Others (e.g., host directory-based volumes) we are
implementing via NFS backed by Manta, our (open
source!) distributed object storage service
• Moving forward, we are working with Docker to help
assure that the Docker Remote API doesn’t create new
implicit dependencies on physical locality
26. SmartDataCenter + Docker: Networking
• Parallel to our SmartOS and Docker work, we have
been working on next-generation software-defined
networking for SmartOS and SmartDataCenter
• Goal was to use standard encapsulation/decapsulation
protocols (i.e., VXLAN) for overlay networks
• We have taken a kernel-based (and ARP-inspired)
approach to assure scale
• Complements SDC’s existing in-kernel, API-managed
firewall facilities
• All done in the open: on the dev-overlay branch of
SmartOS (illumos-joyent) and as sdc-portolan
27. Putting it all together: sdc-docker
• Our Docker engine for SDC, sdc-docker, implements
the end points for the Docker Remote API
• Work is young (started in earnest in early fall 2014), but
because it takes advantage of a proven orchestration
substrate, progress has been very quick…
• We will be deploying it into early access production in
the Joyent Public Cloud in Q1CY15
• It’s open source: http://github.com/joyent/sdc-docker;
you can install SDC (either on hardware or on VMware)
and check it out for yourself!
• A demo is worth a thousand slides...
28. Future of containers in production
• For nearly a decade, we at Joyent have believed that
OS-virtualized containers are the future of computing
• While the efficiency gains are tremendous, they have
not alone been enough to propel containers into the
mainstream
• We believe that the developer ease of Docker combined
with the proven production substrate of SmartOS and
SmartDataCenter yields the best of all worlds
• The future of containers is one without compromise:
developer efficiency, operational elasticity, multi-tenant
security and on-the-metal performance!
29. Thank you!
• Jerry Jelinek, @pfmooney, @jmclulow and @jperkin for
their work on LX branded zones
• @joshwilsdon, @trentmick, @cachafla and @orlandov
for their work on sdc-docker
• @rmustacc, @wayfaringrob, @fredfkuo and @notmatt
for their work on SDC overlay networking
• The countless engineers who have worked on or with
illumos because they believed in OS-based virtualization