1. Let Me Contain That For You
Containers @ Google
Victor Marmol (vmarmol@google.com)
Rohit Jnagal (jnagal@google.com)
SF Bay Area Large-Scale Production Engineering: Lightweight Containers Meetup
February 20, 2014
Google Confidential and Proprietary
2. Containers in the Wild
User 1
User 2
User 3
User 4
Linux Kernel
●
●
●
●
Used to provide VM-like instances
High density (lower costs) and high performance
Fast to start
Migration is hard, but possible
Google Confidential and Proprietary
3. The Need for Isolation: A Shared Google Machine
I/O:CPU:Mem
Sensitive Task
Front End Task
Back End Task
Alloc
BACKGROUND
System Daemons
Batch workload
TASKS
Soaker workload
Google Confidential and Proprietary
4. Containers @ Google
SS1
SS2
Sub 2
Task 1
Task 2
Sub 1
Sub 4
Sub 1
SS3
Sub 3
SS4
Sub 3
Sub 2
Alloc 1
Task 1
Task 2
Linux Kernel
●
●
●
●
Container-aware tasks use asymmetric subcontainers
Provide different guarantees of quality of service
Overcommit resources to achieve high utilization
Early users, few namespaces, and near-zero overhead
Google Confidential and Proprietary
5. Asymmetric Isolation
Isolating only certain resources (e.g., CPU but not memory).
CPU
Memory
Net
Container 1
Container 2
Container 3
Google Confidential and Proprietary
6. Containers @ Google Today
● Historically
○
○
○
●
●
●
●
●
2004: No isolation
2006: Cgroups
Now: Namespaces
Primarily Linux cgroups + user-space policies and monitoring
We skipped VMs due to high overhead
Used everywhere: SaaS, PaaS, IaaS; Android, Chrome OS
Heterogeneous workloads: Latency, bandwidth, and priority
High task churn
Google Confidential and Proprietary
7. Goals
● Isolation
○ Tasks do not impact each other
○ The behavior of a Task is the same regardless of what else is
on the machine
● Predictability
○ Tasks behave the same each time they run
○ Unless they are specifically configured to use "slack"
● Quality of Service
○ Different tasks get different quality of resources
● Overcommitment
○ Oversell machine resources within QoS guarantees
Google Confidential and Proprietary
8. lmctfy: Let Me Contain That For You
Open source containers stack based on Google’s.
github.com/google/lmctfy/
Provides the Container abstraction to higher levels by abstracting away
the kernel interfaces.
Motivation
● Existing code, systems, and design around containers
● Problems with LXC
○
○
No abstraction (direct knob exposure)
No easy way to access programmatically
Google Confidential and Proprietary
9. lmctfy: Let Me Contain That For You
Objectives
● Abstract away enforcement: separate policy from enforcement
● Scalability and parallel access
● Intent-based container specifications
● Asymmetric isolation
● Subcontainer support
● Provides tiers of quality of service
System Layers
● CL1
○
○
○
Container abstraction and enforcement
Thin and light layer
Current lmctfy
● CL2
○
○
○
Sets policy (QoS, overcommitment)
Higher level logic, monitoring, and control loops
Stateful entity
Google Confidential and Proprietary
11. Released 0.4.0 (This Week!)
Initial version of lowest layer
● Written entirely in C++
● Delivered as a CLI and a C++ library (C and Go bindings soon)
● Isolation for CPU, memory, and perf event
● Full support for subcontainers
● “Stateless” and lightweight
● Initial support for namespaces, more to come in the next week.
Can be augmented with custom kernel patches
● CPU latency and accounting
● OOM priority
Supported configurations
● Target configuration is well supported
● Designed to be flexible, but we test on a limited set of them
● More target configurations being added
● Contributions to add more are welcome
Google Confidential and Proprietary
14. C++ API
::containers::lmctfy::ContainerApi
● Create
● Get
● Destroy
● Detect
● InitMachine
::containers::lmctfy::Container
● Update
● Run
● Notifications
● List (threads, PIDs, and subcontainers)
● Stats
● Pause/Resume
● KillAll
CLI is a thin wrapper around the C++ API
Google Confidential and Proprietary
15. Container Names
Path-like hierarchy of container names:
Absolute: /parent/self
Relative: self when in /parent
Container Name
Refers To
/
The root top-level container
/sys
The sys top-level container
/sys/sub
The sub subcontainer of the sys top-level container
. or ./
The current container (current relative to the calling process)
..
The parent container (parent relative to the calling process)
./foo_container
or foo_container
The foo_container subcontainer of the current container
/foo_container
The foo_container top-level container
Google Confidential and Proprietary
16. Roadmap
Towards Version 1.0
● Improve VirtualHost support
● Root file systems
● Checkpoint restore
● Support and target most major distros
● Fully compatible with Docker’s use of containers
Higher Layer
● Admission control and feasibility checks
● Monitoring, notifications, and statistics
● Tiers of quality of service guarantees
Contributions Welcome!
Google Confidential and Proprietary