Lxc – next gen virtualization for cloud intro (cloudexpo)
1. Linux Containers – NextGen Virtualization
for Cloud (Intro & Overview)
Cloud Expo
June 10-12, 2014
New York City, NY
Boden Russell (brussell@us.ibm.com)
2. Why LXC: Performance
6/13/2014 2
Manual VM LXC
Provision Time
Days
Minutes
Seconds / ms
linpack performance @ 45000
0
50
100
150
200
250
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
B
M
vcpus
GFlops
3. Why LXC: Industry Uptrend
6/13/2014 3
Google trends - LXC
Google trends - docker
5. Why LXC: Lower TCO
Supported with out of the box modern
Linux Kernel
Open source toolsets
Cloudy integration
6/13/2014 5
6. Definitions
Linux Containers (LXC LinuX Containers)
– Lightweight virtualization
– Realized using features provided by a modern Linux kernel
– VMs without the hypervisor (kind of)
Containerization of
– (Linux) Operating Systems
– Single or multiple applications
LXC as a technology ≠ LXC “tools”
6/13/2014 6
7. Hypervisors vs. Linux Containers
6/13/2014 7
Hardware
Operating System
Hypervisor
Virtual Machine
Operating
System
Bins / libs
App App
Virtual Machine
Operating
System
Bins / libs
App App
Hardware
Hypervisor
Virtual Machine
Operating
System
Bins / libs
App App
Virtual Machine
Operating
System
Bins / libs
App App
Hardware
Operating System
Container
Bins / libs
App App
Container
Bins / libs
App App
Type 1 Hypervisor Type 2 Hypervisor Linux Containers
Containers share the OS kernel of the host and thus are lightweight.
However, each container must have the same OS kernel.
Containers are isolated, but
share OS and, where
appropriate, libs / bins.
8. LXC Technology Stack
6/13/2014 8
UserSpaceKernelSpace
Kernel
System Call Interface
Architecture Dependent Kernel Code
GLIBC / Pseudo FS / User Space Tools & Libs
Linux Container Tooling
Linux Container Commoditization
Orchestration & Management
Hardware
cgroups
namespaces
chroots
LSM
lxc
9. So You Want To Build A Container?
High level checklist
– Process(es)
– Throttling / limits
– Prioritization
– Resource isolation
– Root file system
– Security
6/13/2014 9
my-lxc
?
10. Linux Control Groups (cgroups)
Problem
– How do I throttle, prioritize, control and obtain metrics for a group of
tasks (processes)?
Solution control groups (cgroups)
6/13/2014 10
cgroup blue
proc
proc
proc
– Device Access
– Resource limiting
– Prioritization
– Accounting
– Control
– Injection
11. Linux cgroup Subsystems
Subsystem Tunable Parameters
blkio - Weighted proportional block I/O access. Group wide or per device.
- Per device hard limits on block I/O read/write specified as bytes per second or
IOPS per second.
cpu - Time period (microseconds per second) a group should have CPU access.
- Group wide upper limit on CPU time per second.
- Weighted proportional value of relative CPU time for a group.
cpuset - CPUs (cores) the group can access.
- Memory nodes the group can access and migrate ability.
- Memory hardwall, pressure, spread, etc.
devices - Define which devices and access type a group can use.
freezer - Suspend/resume group tasks.
memory - Max memory limits for the group (in bytes).
- Memory swappiness, OOM control, hierarchy, etc..
hugetlb - Limit HugeTLB size usage.
- Per cgroup HugeTLB metrics.
net_cls - Tag network packets with a class ID.
- Use tc to prioritize tagged packets.
net_prio - Weighted proportional priority on egress traffic (per interface).
6/13/2014 11
14. Linux cgroups: CPU Usage
Use CPU shares (and other controls) to prioritize jobs /
containers
Carry out complex scheduling schemes
Segment host resources
Adhere to SLAs
6/13/2014 14
15. Linux cgroups: CPU Pinning
Pin containers / jobs to CPU cores
Carry out complex scheduling schemes
Reduce core switching costs
Adhere to SLAs
6/13/2014 15
17. So You Want To Build A Container?
6/13/2014 17
18. Linux namespaces
Problem
– How do I provide an isolated view of global resources to a group of tasks
(processes)?
Solution namespaces
6/13/2014 18
namespace blue
– MNT; mount points, files
systems, etc.
– PID; processes
– NET; NICs, routing, etc.
– IPC; System V IPC
– UTS; host and domain name
– USER; UID and GID
MNT
PID
NET
UTS
USER
proc
proc
proc
20. Linux namespaces: Common Idioms
It’s not required to use all namespaces
– Pick & choose; if your toolset allows it
Constructs exist to permit “connectivity” between parent /
child namespace
Various linux user space tools have namespace support
Linux sys API supports flexible namespace creation
6/13/2014 20
21. Linux namespaces & cgroups: Availability
6/13/2014 21
Note: user namespace support in
upstream kernel 3.8+, but
distributions rolling out phased
support:
- Map LXC UID/GID between
container and host
- Non-root LXC creation
22. So You Want To Build A Container?
6/13/2014 22
23. Linux chroot & pivot_root
Using pivot_root with MNT namespace addresses escaping chroot
concerns
The pivot_root target directory becomes the “new root FS”
6/13/2014 23
24. LXC Images
LXC images provide a flexible means to deliver only what you need – lightweight and minimal
footprint
Basic constraints
– Same architecture & endian
– Linux’ish Operating System; you can run different Linux distros on same host
Image types
– System; virtualize Operating System(s) – standard distro root FS less the kernel
– Application; virtualize application(s) – only package apps + dependencies (aka JeOS – Just
enough Operating System)
Bind mount host libs / bins into LXC to share host resources
Container image init process
– Container init command provided on invocation – can be an application or a full fledged
init process
– Init script customized for image – skinny SysVinit, upstart, etc.
– Reduces overhead of lxc start-up and runtime foot print
Various tools to build images
– SuSE Kiwi
– Debootstrap
– Etc.
LXC tooling options often include numerous image templates
6/13/2014 24
25. So You Want To Build A Container?
6/13/2014 25
26. Linux Security Modules & MAC
Linux Security Modules (LSM) – kernel modules which provide a
framework for Mandatory Access Control (MAC) security implementations
MAC vs DAC
– In MAC, admin (user or process) assigns access controls to subject / initiator
– In DAC, resource owner (user) assigns access controls to individual resources
Existing LSM implementations include: AppArmor, SELinux, GRSEC, etc.
6/13/2014 26
27. Linux Capabilities
Per process privileges which define sys call
access
Can be assigned to LXC process(es)
6/13/2014 27
28. Other Security Measures
Reduce shared FS access using RO bind mounts
Linux seccomp
– Confine system calls
Keep Linux kernel up to date
User namespaces in 3.8+ kernel
– Launching containers as non-root user
– Mapping UID / GID into container
6/13/2014 28
29. So You Want To Build A Container?
6/13/2014 29
30. LXC Industry Tooling
Virtuozzo OpenVZ Linux
VServer
Libvirt-lxc Lxc (tools) Warden lmctfy Docker
Summary Commercial
product
using
OpenVZ
under the
hood
Custom
Kernel
providing
well
seasoned
LXC support
A set of
kernel
patches
providing
LXC. Not
based on
cgroups or
namespaces.
Libvirt support
for LXC via
cgroups and
namespaces.
Lib + set of user
spaces tools
/bindings for
LXC.
LXC
management
tooling used by
CF.
Similar to LXC,
but provides
more intent
based focus.
Commoditizatio
n of LXC adding
support for
images, build
files, etc.
Part of
upstream
Kernel?
No No Partial Yes Yes Yes Yes, but
additional
patches needed
for specific
features.
Yes
License Commercial GNU GPL v2 GNU GPL v2 GNU LGPL GNU LGPL Apache v2 Apache v2 Apache v2
APIs /
Bindings
- CLI
- API
- CLI
- C
- CLI
- C
- Python
- Java
- C#
- PHP
- Python
- Lua
- GO
- CLI
- GO
- REST
- CLI
- Python
- Other 3rd
party libs
Managem
ent plane/
Dashboard
Virtuozzo
Parrallels
Virtuozzo
Parrallels +
others
- OpenStack
- Archipel
- Virt-
Manager
- LXC web
panel
- Lexy
- OpenStack
- Shipyard
- Docker UI
6/13/2014 30
31. LXC Orchestration & Management
Docker & libvirt-lxc in OpenStack
– Manage containers heterogeneously with traditional VMs… but not w/the level
of support & features we might like
CoreOS
– Zero-touch admin Linux distro with docker images as the unit of operation
– Centralized key/value store to coordinate distributed environment
Various other 3rd party apps
– Maestro for docker
– Shipyard for docker
– Fleet for CoreOS
– Etc.
LXC migration
– Container migration via criu
But…
– Still no great way to tie all virtual resources together with LXC – e.g. storage +
networking
• IMO; an area which needs focus for LXC to become more generally applicable
6/13/2014 31
32. LXC Gaps
There are gaps…
Lack of industry tooling / support
Live migration still a WIP
Full orchestration across resources (compute / storage / networking)
Fears of security
Not a well known technology… yet
Integration with existing virtualization and Cloud tooling
Not much / any industry standards
Missing skillset
Slower upstream support due to kernel dev process
Etc.
6/13/2014 32
33. LXC: Use Cases For Traditional VMs
There are still use cases where traditional VMs are warranted.
Virtualization of non Linux based OSs
– Windows
– AIX
– Etc.
LXC not supported on host
VM requires unique kernel setup which is not applicable to other VMs on the host
(i.e. per VM kernel config)
Etc.
6/13/2014 33