SlideShare a Scribd company logo
1 of 29
Linux Containers (LXC)
IEEE P2302 Working Group Brief
Boden Russell (brussell@us.ibm.com /
bodenru@gmail.com)
Definitions
 Linux Containers (LXC  LinuX Containers)
– Lightweight virtualization
– Realized using features provided by a modern Linux kernel
– VMs without the hypervisor (kind of)
 Containerization of
– (Linux) Operating Systems (less the kernel)
– Single or multiple applications
 LXC as a technology ≠ LXC “tools”
6/27/2014 2
Why LXC
 Provision in seconds / milliseconds
 Near bare metal runtime performance
 VM-like agility – it’s still “virtualization”
 Flexibility
– Containerize a “system”
– Containerize “application(s)”
 Lightweight
– Just enough Operating System (JeOS)
– Minimal per container penalty
 Open source – free – lower TCO
 Supported with OOTB modern Linux kernel
 Growing in popularity
6/27/2014 3
“Linux Containers are poised as the next VM in our modern Cloud era…”
Manual VM LXC
Provision Time
Days
Minutes
Seconds / ms
linpack performance @ 45000
0
50
100
150
200
250
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
BM
vcpus
GFlops
Google trends - LXC Google trends - docker
LXC Technology Stack
6/27/2014 4
UserSpaceKernelSpace
Kernel
System Call Interface
Architecture Dependent Kernel Code
GLIBC / Pseudo FS / User Space Tools & Libs
Linux Container Tooling
Linux Container Commoditization
Orchestration & Management
Hardware
cgroups
namespaces
chroots
LSM
lxc
Hypervisors vs. Linux Containers
6/27/2014 5
Hardware
Operating System
Hypervisor
Virtual Machine
Operating
System
Bins / libs
App App
Virtual Machine
Operating
System
Bins / libs
App App
Hardware
Hypervisor
Virtual Machine
Operating
System
Bins / libs
App App
Virtual Machine
Operating
System
Bins / libs
App App
Hardware
Operating System
Container
Bins / libs
App App
Container
Bins / libs
App App
Type 1 Hypervisor Type 2 Hypervisor Linux Containers
Containers share the OS kernel of the host and thus are lightweight.
However, each container must have the same OS kernel.
Containers are isolated, but
share OS and, where
appropriate, libs / bins.
Linux cgroups
 History
– Work started in 2006 by google engineers
– Merged into upstream 2.6.24 kernel due to wider spread LXC usage
– A number of features still a WIP
 Functionality
– Access; which devices can be used per cgroup
– Resource limiting; memory, CPU, device accessibility, block I/O, etc.
– Prioritization; who gets more of the CPU, memory, etc.
– Accounting; resource usage per cgroup
– Control; freezing & check pointing
– Injection; packet tagging
 Usage
– cgroup functionality exposed as “resource controllers” (aka “subsystems”)
– Subsystems mounted on FS
– Top-level subsystem mount is the root cgroup; all procs on host
– Directories under top-level mounts created per cgroup
– Procs put in tasks file for group assignment
– Interface via read / write pseudo files in group
6/27/2014 6
Cgroups provide throttling, prioritization, control and metrics for a group of processes.
Linux cgroups Subsystems
 cgroups provided via kernel modules
– Not always loaded / provided by default
– Locate and load with modprobe
 Some features tied to kernel version
 See: https://www.kernel.org/doc/Documentation/cgroups/
6/27/2014 7
Subsystem Tunable Parameters
blkio - Weighted proportional block I/O access. Group wide or per device.
- Per device hard limits on block I/O read/write specified as bytes per second or IOPS per second.
cpu - Time period (microseconds per second) a group should have CPU access.
- Group wide upper limit on CPU time per second.
- Weighted proportional value of relative CPU time for a group.
cpuset - CPUs (cores) the group can access.
- Memory nodes the group can access and migrate ability.
- Memory hardwall, pressure, spread, etc.
devices - Define which devices and access type a group can use.
freezer - Suspend/resume group tasks.
memory - Max memory limits for the group (in bytes).
- Memory swappiness, OOM control, hierarchy, etc..
hugetlb - Limit HugeTLB size usage.
- Per cgroup HugeTLB metrics.
net_cls - Tag network packets with a class ID.
- Use tc to prioritize tagged packets.
net_prio - Weighted proportional priority on egress traffic (per interface).
Linux namespaces
 History
– Initial kernel patches in 2.4.19
– Recent 3.8 patches for user namespace support
– A number of features still a WIP
 Functionality
– Provide process level isolation of global resources
• MNT (mount points, file systems, etc.)
• PID (process)
• NET (NICs, routing, etc.)
• IPC (System V IPC resources)
• UTS (host & domain name)
• USER (UID + GID)
– Process(es) in namespace have illusion they are the only processes on the system
– Generally constructs exist to permit “connectivity” with parent namespace
 Usage
– Construct namespace(s) of desired type
– Create process(es) in namespace (typically done when creating namespace)
– If necessary, initialize “connectivity” to parent namespace
– Process(es) in name space internally function as if they are only proc(s) on system
6/27/2014 8
Namespaces provide isolation of Linux resources for a group of processes.
Linux namespaces: Conceptual Overview
6/27/2014 9
global (i.e. root) namespace
MNT NS
/
/proc
/mnt/fsrd
/mnt/fsrw
/mnt/cdrom
/run2
UTS NS
globalhost
rootns.com
PID NS
PID COMMAND
1 /sbin/init
2 [kthreadd]
3 [ksoftirqd]
4 [cpuset]
5 /sbin/udevd
6 /bin/sh
7 /bin/bash
IPC NS
SHMID OWNER
32452 root
43321 boden
SEMID OWNER
0 root
1 Boden
MSQID OWNER
NET NS
lo: UNKNOWN…
eth0: UP…
eth1: UP…
br0: UP…
app1 IP:5000
app2 IP:6000
app3 IP:7000
USER NS
root 0:0
ntp 104:109
mysql 105:110
boden 106:111
purple namespace
MNT NS
/
/proc
/mnt/purplenfs
/mnt/fsrw
/mnt/cdrom
UTS NS
purplehost
purplens.com
PID NS
PID COMMAND
1 /bin/bash
2 /bin/vim
IPC NS
SHMID OWNER
SEMID OWNER
0 root
MSQID OWNER
NET NS
lo: UNKNOWN…
eth0: UP…
app1 IP:1000
app2 IP:7000
USER NS
root 0:0
app 106:111
blue namespace
MNT NS
/
/proc
/mnt/cdrom
/bluens
UTS NS
bluehost
bluens.com
PID NS
PID COMMAND
1 /bin/bash
2 python
3 node
IPC NS
SHMID OWNER
SEMID OWNER
MSQID OWNER
NET NS
lo: UNKNOWN…
eth0: DOWN…
eth1: UP
app1 IP:7000
app2 IP:9000
USER NS
root 0:0
app 104:109
Linux namespaces: Common Idioms
 It’s not required to use all namespaces
– Pick & choose; if your toolset allows it
 Constructs exist to permit “connectivity” between parent /
child namespace
 Various linux user space tools have namespace support
 Linux sys API supports flexible namespace creation
6/27/2014 10
Linux namespaces & cgroups: Availability
6/27/2014 11
Note: user namespace support in
upstream kernel 3.8+, but
distributions rolling out phased
support:
- Map LXC UID/GID between
container and host
- Non-root LXC creation
Linux chroot & pivot_root
 Using pivot_root with MNT namespace addresses escaping chroot
concerns
 The pivot_root target directory becomes the “new root FS”
6/27/2014 12
LXC Images
LXC images provide a flexible means to deliver only what you need – lightweight and minimal
footprint.
 Basic constraints
– Same architecture & endian
– Linux’ish Operating System; you can run different Linux distros on same host
 Image types
– System; virtualize Operating System(s) – standard distro root FS less the kernel
– Application; virtualize application(s) – only package apps + dependencies (aka JeOS – Just
enough Operating System)
 Bind mount host libs / bins into LXC to share host resources
 Container image init process
– Container init command provided on invocation – can be an application or a full fledged
init process
– Init script customized for image – skinny SysVinit, upstart, etc.
– Reduces overhead of lxc start-up and runtime foot print
 Various tools to build images
– SuSE Kiwi
– Debootstrap
– Etc.
 LXC tooling options often include numerous image templates
6/27/2014 13
Linux Security Modules & MAC
 Linux Security Modules (LSM) – kernel modules which provide a
framework for Mandatory Access Control (MAC) security implementations
 MAC vs DAC
– In MAC, admin (user or process) assigns access controls to subject / initiator
– In DAC, resource owner (user) assigns access controls to individual resources
 Existing LSM implementations include: AppArmor, SELinux, GRSEC, etc.
6/27/2014 14
Linux Capabilities
 Per process privileges which define sys call
access
 Can be assigned to LXC process(es)
6/27/2014 15
Other Security Measures
 Reduce shared FS access using RO bind mounts
 Linux seccomp
– Confine system calls
 Keep Linux kernel up to date
 User namespaces in 3.8+ kernel; mitigates container root
escape concerns
– Launching containers as non-root user
– Mapping UID / GID into container
6/27/2014 16
LXC Security Triangle
 Private environments (trusted code)
– App packaging / deployment / management / etc, devOps, Cloud, etc…
No additional worries about security
 Public environments
– Single tenant
• Same restrictions as private envs; tenant trusted code
– Multi-tenant
6/27/2014 17
Privileges, Multitenancy, Untrusted Code
SecurityMeasures
LSM, capabilities,
seccomp, RO bind mounts,
GRSEC, etc.
LXC Security Triangle
NOTE: User namespaces will
mitigate root escape
concerns
So You Want To Build A Container?
6/27/2014 18
LXC Engine: A “Hypervisor” for Containers
6/27/2014 19
Hardware
Hypervisor
Virtual Machine
Operating
System
Bins / libs
App App
Virtual Machine
Operating
System
Bins / libs
App App
Hardware
Operating System
Container
Bins / libs
App App
Container
Bins / libs
App App
Hardware
Operating System
Container
Bins / libs
App App
Container
Bins / libs
App App
LXC Engine / Tooling
Type 1 Hypervisor Linux Containers LXC Engine / Tooling
Conceptual Mapping
VM  Container
Hypervisor  LXC Engine
Docker Container Engine
6/27/2014 20
PC Mac Virtual Machine Bare Metal
Docker
Image
Develop container image once – run anywhere the docker engine is supported.
Typical Container Engine / Tooling Functionality
 Standard container operations to realize / manage LXCs
– Run, start, stop, delete, attach, snapshot, etc.
 APIs
– CLI, Web API (RESTful or other), Automation files (e.g. dockerfile)
– Security – who can use the APIs
 Images
– Format (typically RAW), required dev files, etc..
– Image repository or catalog
– Container FS layers (union FS style)
– Import / export images and containers
 Metrics
– CPU, memory, block IO, etc..
 Eventing
– Push events on container changes
 Logs
– Inspect / manage container logs
 Checkpointing
– Freeze a container’s state
6/27/2014 21
Sample concepts below; not a complete list.
LXC Host / Engine Portability Considerations
 Linux Kernel dependencies
– Non-standard kernel modules needed by app(s) on image
– LXC related kernel features tied to specific versions of the Kernel
 LSM profile settings; security
– Different profile config per LSM (e.g. AppArmor, SELinux, etc.)
 Container file system / storage type & capability
– FS type
– FS capabilities such as direct I/O
 External storage
– FS vols bind mounted into container
 Network options
– Linux bridging, OVS, etc…
 Shared host libs / bins; dependencies from host OS
6/27/2014 22
Sample concepts below; not a complete list.
Not well solidified in current open source community.
LXC Runtime Configuration Considerations
 Linux capabilities
– Applied to container process for Linux security
– Sysctl settings
 Cgroups settings
– Allowed devices, limits, UID / GID mappings, etc.
 Environment variables
– Exposed to shell / procs in container
 Namespaces
– Which to use for container procs
 Networking
– Type, IP, gateway, hostname, etc..
 Init
– Which cmd / bin to run on container init
6/27/2014 23
Example: http://tinyurl.com/nj4u3le
Sample concepts below; not a complete list.
Community consolidating around docker’s libcontainer project.
LXC Industry Tooling
Parallels OpenVZ Linux
VServer
Libvirt-lxc Lxc (tools) Warden lmctfy Docker
Summary Commercial
product
using
OpenVZ
under the
hood
Custom
Kernel (pre
GNU 3.13)
providing
well
seasoned
LXC support.
A set of
kernel
patches
providing
LXC. Not
based on
cgroups or
namespaces.
Libvirt support
for LXC via
cgroups and
namespaces.
Lib + set of user
spaces tools
/bindings for
LXC.
LXC
management
tooling used by
CF.
Similar to LXC,
but provides
more intent
based focus.
Commoditizatio
n of LXC adding
support for
images, build
files, etc.
Part of
upstream
Kernel?
No No Partial Yes Yes Yes Yes, but
additional
patches needed
for specific
features.
Yes
License Commercial GNU GPL v2 GNU GPL v2 GNU LGPL GNU LGPL Apache v2 Apache v2 Apache v2
APIs /
Bindings
- CLI
- API
- CLI
- C
- CLI
- C
- Python
- Java
- C#
- PHP
- Python
- Lua
- GO
- CLI
- GO
- REST
- CLI
- Python
- Other 3rd
party libs
Managem
ent plane/
Dashboard
Parallels
Cloud Server
Parallels +
others
- OpenStack
- Archipel
- Virt-
Manager
- LXC web
panel
- Lexy
- OpenStack
- Shipyard
- Docker UI
6/27/2014 24
LXC Orchestration & Management
 Docker & libvirt-lxc in OpenStack
– Manage containers heterogeneously with traditional VMs… but not w/the level
of support & features we might like
 CoreOS
– Zero-touch admin Linux distro with docker images as the unit of operation
– Centralized key/value store to coordinate distributed environment
 Various other 3rd party apps
– Maestro for docker
– Shipyard for docker
– Fleet for CoreOS
– Etc.
 LXC migration
– Container migration via criu
 But…
– Still no great way to tie all virtual resources together with LXC – e.g. storage +
networking
• IMO; an area which needs focus for LXC to become more generally applicable
6/27/2014 25
LXC Gaps
There are gaps…
 Lack of industry tooling / support
 Live migration still a WIP
 Full orchestration across resources (compute / storage / networking)
 Fears of security
 Not a well known technology… yet
 Integration with existing virtualization and Cloud tooling
 Not much / any industry standards
 Missing skillset
 Slower upstream support due to kernel dev process
 Etc.
6/27/2014 26
LXC: Use Cases For Traditional VMs
There are still use cases where traditional VMs are warranted.
 Virtualization of non Linux based OSs
– Windows
– AIX
– Etc.
 LXC not supported on host
 VM requires unique kernel setup which is not applicable to other VMs on the host
(i.e. per VM kernel config)
 Etc.
6/27/2014 27
Questions?
6/27/2014 28
Links & Resources
 http://www.slideshare.net/BodenRussell/realizing-linux-containerslxc
 http://www.slideshare.net/BodenRussell/kvm-and-docker-lxc-benchmarking-with-
openstack
 https://github.com/dotcloud/docker/tree/master/vendor/src/github.com/docker/
libcontainer
 https://github.com/docker/libchan
 https://github.com/docker/libswarm
 https://help.ubuntu.com/lts/serverguide/lxc.html
 http://www.docker.com/
 http://www.parallels.com/
 http://openvz.org/Main_Page
 http://criu.org/LXC
 https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt
 https://lwn.net/Articles/531114/
 http://www.hansenpartnership.com/OpenStackContainers2014/#/begin
6/27/2014 29

More Related Content

What's hot

Containers are the future of the Cloud
Containers are the future of the CloudContainers are the future of the Cloud
Containers are the future of the Cloud
Pavel Odintsov
 
Linux Container Technology 101
Linux Container Technology 101Linux Container Technology 101
Linux Container Technology 101
inside-BigData.com
 
Docker - container and lightweight virtualization
Docker - container and lightweight virtualization Docker - container and lightweight virtualization
Docker - container and lightweight virtualization
Sim Janghoon
 
Inside Docker for Fedora20/RHEL7
Inside Docker for Fedora20/RHEL7Inside Docker for Fedora20/RHEL7
Inside Docker for Fedora20/RHEL7
Etsuji Nakai
 
Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)
Dobrica Pavlinušić
 

What's hot (20)

Containers are the future of the Cloud
Containers are the future of the CloudContainers are the future of the Cloud
Containers are the future of the Cloud
 
Linux Container Technology 101
Linux Container Technology 101Linux Container Technology 101
Linux Container Technology 101
 
LXC, Docker, security: is it safe to run applications in Linux Containers?
LXC, Docker, security: is it safe to run applications in Linux Containers?LXC, Docker, security: is it safe to run applications in Linux Containers?
LXC, Docker, security: is it safe to run applications in Linux Containers?
 
Docker introduction
Docker introductionDocker introduction
Docker introduction
 
Docker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and securityDocker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and security
 
Introduction to linux containers
Introduction to linux containersIntroduction to linux containers
Introduction to linux containers
 
Lightweight Virtualization: LXC containers & AUFS
Lightweight Virtualization: LXC containers & AUFSLightweight Virtualization: LXC containers & AUFS
Lightweight Virtualization: LXC containers & AUFS
 
LXC
LXCLXC
LXC
 
Docker storage drivers by Jérôme Petazzoni
Docker storage drivers by Jérôme PetazzoniDocker storage drivers by Jérôme Petazzoni
Docker storage drivers by Jérôme Petazzoni
 
LXC – NextGen Virtualization for Cloud benefit realization (cloudexpo)
LXC – NextGen Virtualization for Cloud benefit realization (cloudexpo)LXC – NextGen Virtualization for Cloud benefit realization (cloudexpo)
LXC – NextGen Virtualization for Cloud benefit realization (cloudexpo)
 
Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)
 
Rooting Out Root: User namespaces in Docker
Rooting Out Root: User namespaces in DockerRooting Out Root: User namespaces in Docker
Rooting Out Root: User namespaces in Docker
 
Docker: Aspects of Container Isolation
Docker: Aspects of Container IsolationDocker: Aspects of Container Isolation
Docker: Aspects of Container Isolation
 
Docker internals
Docker internalsDocker internals
Docker internals
 
Docker - container and lightweight virtualization
Docker - container and lightweight virtualization Docker - container and lightweight virtualization
Docker - container and lightweight virtualization
 
LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013
 
Containers, Docker, and Security: State Of The Union (LinuxCon and ContainerC...
Containers, Docker, and Security: State Of The Union (LinuxCon and ContainerC...Containers, Docker, and Security: State Of The Union (LinuxCon and ContainerC...
Containers, Docker, and Security: State Of The Union (LinuxCon and ContainerC...
 
Inside Docker for Fedora20/RHEL7
Inside Docker for Fedora20/RHEL7Inside Docker for Fedora20/RHEL7
Inside Docker for Fedora20/RHEL7
 
Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)
 
Lxc- Introduction
Lxc- IntroductionLxc- Introduction
Lxc- Introduction
 

Viewers also liked

Arus bolak balik klompok 4
Arus bolak balik klompok 4Arus bolak balik klompok 4
Arus bolak balik klompok 4
FITRIA NENGSIH
 
Zato my iz_zato
Zato my iz_zatoZato my iz_zato
Zato my iz_zato
marymam
 
03 streamline english destinations
03 streamline english destinations03 streamline english destinations
03 streamline english destinations
thucvat
 
Safe Routes to School - Elise Bremer-Nei
Safe Routes to School - Elise Bremer-NeiSafe Routes to School - Elise Bremer-Nei
Safe Routes to School - Elise Bremer-Nei
njbikeped
 
04 streamline english directions
04 streamline english directions04 streamline english directions
04 streamline english directions
thucvat
 
зато мы из ЗАТО
зато мы из ЗАТОзато мы из ЗАТО
зато мы из ЗАТО
marymam
 
Zato my iz_zato-1
Zato my iz_zato-1Zato my iz_zato-1
Zato my iz_zato-1
marymam
 
зато мы из ЗАТО
зато мы из ЗАТОзато мы из ЗАТО
зато мы из ЗАТО
marymam
 

Viewers also liked (16)

Arus bolak balik klompok 4
Arus bolak balik klompok 4Arus bolak balik klompok 4
Arus bolak balik klompok 4
 
Penalosa Farm: An Organic Haven
Penalosa Farm: An Organic HavenPenalosa Farm: An Organic Haven
Penalosa Farm: An Organic Haven
 
Zato my iz_zato
Zato my iz_zatoZato my iz_zato
Zato my iz_zato
 
03 streamline english destinations
03 streamline english destinations03 streamline english destinations
03 streamline english destinations
 
Safe Routes to School - Elise Bremer-Nei
Safe Routes to School - Elise Bremer-NeiSafe Routes to School - Elise Bremer-Nei
Safe Routes to School - Elise Bremer-Nei
 
MD Grand Akashu Riau
MD Grand Akashu RiauMD Grand Akashu Riau
MD Grand Akashu Riau
 
04 streamline english directions
04 streamline english directions04 streamline english directions
04 streamline english directions
 
Metadata & Brokering - a modern approach for INGV RI
Metadata & Brokering - a modern approach for INGV RI Metadata & Brokering - a modern approach for INGV RI
Metadata & Brokering - a modern approach for INGV RI
 
UCL of Slideshare
UCL of SlideshareUCL of Slideshare
UCL of Slideshare
 
Swimming
SwimmingSwimming
Swimming
 
Metadata & brokering - a modern approach #2
Metadata & brokering - a modern approach #2Metadata & brokering - a modern approach #2
Metadata & brokering - a modern approach #2
 
зато мы из ЗАТО
зато мы из ЗАТОзато мы из ЗАТО
зато мы из ЗАТО
 
Zato my iz_zato-1
Zato my iz_zato-1Zato my iz_zato-1
Zato my iz_zato-1
 
Parts of a School
Parts of a SchoolParts of a School
Parts of a School
 
зато мы из ЗАТО
зато мы из ЗАТОзато мы из ЗАТО
зато мы из ЗАТО
 
Introduction to DRDA CPAs & Business Consultants
Introduction to DRDA CPAs & Business ConsultantsIntroduction to DRDA CPAs & Business Consultants
Introduction to DRDA CPAs & Business Consultants
 

Similar to Linux Container Brief for IEEE WG P2302

Introduction to NetBSD kernel
Introduction to NetBSD kernelIntroduction to NetBSD kernel
Introduction to NetBSD kernel
Mahendra M
 
Evolution of Linux Containerization
Evolution of Linux Containerization Evolution of Linux Containerization
Evolution of Linux Containerization
WSO2
 
Revolutionizing the cloud with container virtualization
Revolutionizing the cloud with container virtualizationRevolutionizing the cloud with container virtualization
Revolutionizing the cloud with container virtualization
WSO2
 

Similar to Linux Container Brief for IEEE WG P2302 (20)

Lightweight Virtualization in Linux
Lightweight Virtualization in LinuxLightweight Virtualization in Linux
Lightweight Virtualization in Linux
 
First steps on CentOs7
First steps on CentOs7First steps on CentOs7
First steps on CentOs7
 
The building blocks of docker.
The building blocks of docker.The building blocks of docker.
The building blocks of docker.
 
Introduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & ContainersIntroduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & Containers
 
20240201 [HPC Containers] Rootless Containers.pdf
20240201 [HPC Containers] Rootless Containers.pdf20240201 [HPC Containers] Rootless Containers.pdf
20240201 [HPC Containers] Rootless Containers.pdf
 
Evolution of containers to kubernetes
Evolution of containers to kubernetesEvolution of containers to kubernetes
Evolution of containers to kubernetes
 
Introduction to NetBSD kernel
Introduction to NetBSD kernelIntroduction to NetBSD kernel
Introduction to NetBSD kernel
 
Containers and workload security an overview
Containers and workload security an overview Containers and workload security an overview
Containers and workload security an overview
 
Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup. Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup.
 
Evolution of Linux Containerization
Evolution of Linux Containerization Evolution of Linux Containerization
Evolution of Linux Containerization
 
Docker-v3.pdf
Docker-v3.pdfDocker-v3.pdf
Docker-v3.pdf
 
Container Security: How We Got Here and Where We're Going
Container Security: How We Got Here and Where We're GoingContainer Security: How We Got Here and Where We're Going
Container Security: How We Got Here and Where We're Going
 
Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...
 
Revolutionizing the cloud with container virtualization
Revolutionizing the cloud with container virtualizationRevolutionizing the cloud with container virtualization
Revolutionizing the cloud with container virtualization
 
Intro to kubernetes
Intro to kubernetesIntro to kubernetes
Intro to kubernetes
 
Linux Kernel Security Overview - KCA 2009
Linux Kernel Security Overview - KCA 2009Linux Kernel Security Overview - KCA 2009
Linux Kernel Security Overview - KCA 2009
 
Studies
StudiesStudies
Studies
 
Securing Applications and Pipelines on a Container Platform
Securing Applications and Pipelines on a Container PlatformSecuring Applications and Pipelines on a Container Platform
Securing Applications and Pipelines on a Container Platform
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetes
 
Security on a Container Platform
Security on a Container PlatformSecurity on a Container Platform
Security on a Container Platform
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Linux Container Brief for IEEE WG P2302

  • 1. Linux Containers (LXC) IEEE P2302 Working Group Brief Boden Russell (brussell@us.ibm.com / bodenru@gmail.com)
  • 2. Definitions  Linux Containers (LXC  LinuX Containers) – Lightweight virtualization – Realized using features provided by a modern Linux kernel – VMs without the hypervisor (kind of)  Containerization of – (Linux) Operating Systems (less the kernel) – Single or multiple applications  LXC as a technology ≠ LXC “tools” 6/27/2014 2
  • 3. Why LXC  Provision in seconds / milliseconds  Near bare metal runtime performance  VM-like agility – it’s still “virtualization”  Flexibility – Containerize a “system” – Containerize “application(s)”  Lightweight – Just enough Operating System (JeOS) – Minimal per container penalty  Open source – free – lower TCO  Supported with OOTB modern Linux kernel  Growing in popularity 6/27/2014 3 “Linux Containers are poised as the next VM in our modern Cloud era…” Manual VM LXC Provision Time Days Minutes Seconds / ms linpack performance @ 45000 0 50 100 150 200 250 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 BM vcpus GFlops Google trends - LXC Google trends - docker
  • 4. LXC Technology Stack 6/27/2014 4 UserSpaceKernelSpace Kernel System Call Interface Architecture Dependent Kernel Code GLIBC / Pseudo FS / User Space Tools & Libs Linux Container Tooling Linux Container Commoditization Orchestration & Management Hardware cgroups namespaces chroots LSM lxc
  • 5. Hypervisors vs. Linux Containers 6/27/2014 5 Hardware Operating System Hypervisor Virtual Machine Operating System Bins / libs App App Virtual Machine Operating System Bins / libs App App Hardware Hypervisor Virtual Machine Operating System Bins / libs App App Virtual Machine Operating System Bins / libs App App Hardware Operating System Container Bins / libs App App Container Bins / libs App App Type 1 Hypervisor Type 2 Hypervisor Linux Containers Containers share the OS kernel of the host and thus are lightweight. However, each container must have the same OS kernel. Containers are isolated, but share OS and, where appropriate, libs / bins.
  • 6. Linux cgroups  History – Work started in 2006 by google engineers – Merged into upstream 2.6.24 kernel due to wider spread LXC usage – A number of features still a WIP  Functionality – Access; which devices can be used per cgroup – Resource limiting; memory, CPU, device accessibility, block I/O, etc. – Prioritization; who gets more of the CPU, memory, etc. – Accounting; resource usage per cgroup – Control; freezing & check pointing – Injection; packet tagging  Usage – cgroup functionality exposed as “resource controllers” (aka “subsystems”) – Subsystems mounted on FS – Top-level subsystem mount is the root cgroup; all procs on host – Directories under top-level mounts created per cgroup – Procs put in tasks file for group assignment – Interface via read / write pseudo files in group 6/27/2014 6 Cgroups provide throttling, prioritization, control and metrics for a group of processes.
  • 7. Linux cgroups Subsystems  cgroups provided via kernel modules – Not always loaded / provided by default – Locate and load with modprobe  Some features tied to kernel version  See: https://www.kernel.org/doc/Documentation/cgroups/ 6/27/2014 7 Subsystem Tunable Parameters blkio - Weighted proportional block I/O access. Group wide or per device. - Per device hard limits on block I/O read/write specified as bytes per second or IOPS per second. cpu - Time period (microseconds per second) a group should have CPU access. - Group wide upper limit on CPU time per second. - Weighted proportional value of relative CPU time for a group. cpuset - CPUs (cores) the group can access. - Memory nodes the group can access and migrate ability. - Memory hardwall, pressure, spread, etc. devices - Define which devices and access type a group can use. freezer - Suspend/resume group tasks. memory - Max memory limits for the group (in bytes). - Memory swappiness, OOM control, hierarchy, etc.. hugetlb - Limit HugeTLB size usage. - Per cgroup HugeTLB metrics. net_cls - Tag network packets with a class ID. - Use tc to prioritize tagged packets. net_prio - Weighted proportional priority on egress traffic (per interface).
  • 8. Linux namespaces  History – Initial kernel patches in 2.4.19 – Recent 3.8 patches for user namespace support – A number of features still a WIP  Functionality – Provide process level isolation of global resources • MNT (mount points, file systems, etc.) • PID (process) • NET (NICs, routing, etc.) • IPC (System V IPC resources) • UTS (host & domain name) • USER (UID + GID) – Process(es) in namespace have illusion they are the only processes on the system – Generally constructs exist to permit “connectivity” with parent namespace  Usage – Construct namespace(s) of desired type – Create process(es) in namespace (typically done when creating namespace) – If necessary, initialize “connectivity” to parent namespace – Process(es) in name space internally function as if they are only proc(s) on system 6/27/2014 8 Namespaces provide isolation of Linux resources for a group of processes.
  • 9. Linux namespaces: Conceptual Overview 6/27/2014 9 global (i.e. root) namespace MNT NS / /proc /mnt/fsrd /mnt/fsrw /mnt/cdrom /run2 UTS NS globalhost rootns.com PID NS PID COMMAND 1 /sbin/init 2 [kthreadd] 3 [ksoftirqd] 4 [cpuset] 5 /sbin/udevd 6 /bin/sh 7 /bin/bash IPC NS SHMID OWNER 32452 root 43321 boden SEMID OWNER 0 root 1 Boden MSQID OWNER NET NS lo: UNKNOWN… eth0: UP… eth1: UP… br0: UP… app1 IP:5000 app2 IP:6000 app3 IP:7000 USER NS root 0:0 ntp 104:109 mysql 105:110 boden 106:111 purple namespace MNT NS / /proc /mnt/purplenfs /mnt/fsrw /mnt/cdrom UTS NS purplehost purplens.com PID NS PID COMMAND 1 /bin/bash 2 /bin/vim IPC NS SHMID OWNER SEMID OWNER 0 root MSQID OWNER NET NS lo: UNKNOWN… eth0: UP… app1 IP:1000 app2 IP:7000 USER NS root 0:0 app 106:111 blue namespace MNT NS / /proc /mnt/cdrom /bluens UTS NS bluehost bluens.com PID NS PID COMMAND 1 /bin/bash 2 python 3 node IPC NS SHMID OWNER SEMID OWNER MSQID OWNER NET NS lo: UNKNOWN… eth0: DOWN… eth1: UP app1 IP:7000 app2 IP:9000 USER NS root 0:0 app 104:109
  • 10. Linux namespaces: Common Idioms  It’s not required to use all namespaces – Pick & choose; if your toolset allows it  Constructs exist to permit “connectivity” between parent / child namespace  Various linux user space tools have namespace support  Linux sys API supports flexible namespace creation 6/27/2014 10
  • 11. Linux namespaces & cgroups: Availability 6/27/2014 11 Note: user namespace support in upstream kernel 3.8+, but distributions rolling out phased support: - Map LXC UID/GID between container and host - Non-root LXC creation
  • 12. Linux chroot & pivot_root  Using pivot_root with MNT namespace addresses escaping chroot concerns  The pivot_root target directory becomes the “new root FS” 6/27/2014 12
  • 13. LXC Images LXC images provide a flexible means to deliver only what you need – lightweight and minimal footprint.  Basic constraints – Same architecture & endian – Linux’ish Operating System; you can run different Linux distros on same host  Image types – System; virtualize Operating System(s) – standard distro root FS less the kernel – Application; virtualize application(s) – only package apps + dependencies (aka JeOS – Just enough Operating System)  Bind mount host libs / bins into LXC to share host resources  Container image init process – Container init command provided on invocation – can be an application or a full fledged init process – Init script customized for image – skinny SysVinit, upstart, etc. – Reduces overhead of lxc start-up and runtime foot print  Various tools to build images – SuSE Kiwi – Debootstrap – Etc.  LXC tooling options often include numerous image templates 6/27/2014 13
  • 14. Linux Security Modules & MAC  Linux Security Modules (LSM) – kernel modules which provide a framework for Mandatory Access Control (MAC) security implementations  MAC vs DAC – In MAC, admin (user or process) assigns access controls to subject / initiator – In DAC, resource owner (user) assigns access controls to individual resources  Existing LSM implementations include: AppArmor, SELinux, GRSEC, etc. 6/27/2014 14
  • 15. Linux Capabilities  Per process privileges which define sys call access  Can be assigned to LXC process(es) 6/27/2014 15
  • 16. Other Security Measures  Reduce shared FS access using RO bind mounts  Linux seccomp – Confine system calls  Keep Linux kernel up to date  User namespaces in 3.8+ kernel; mitigates container root escape concerns – Launching containers as non-root user – Mapping UID / GID into container 6/27/2014 16
  • 17. LXC Security Triangle  Private environments (trusted code) – App packaging / deployment / management / etc, devOps, Cloud, etc… No additional worries about security  Public environments – Single tenant • Same restrictions as private envs; tenant trusted code – Multi-tenant 6/27/2014 17 Privileges, Multitenancy, Untrusted Code SecurityMeasures LSM, capabilities, seccomp, RO bind mounts, GRSEC, etc. LXC Security Triangle NOTE: User namespaces will mitigate root escape concerns
  • 18. So You Want To Build A Container? 6/27/2014 18
  • 19. LXC Engine: A “Hypervisor” for Containers 6/27/2014 19 Hardware Hypervisor Virtual Machine Operating System Bins / libs App App Virtual Machine Operating System Bins / libs App App Hardware Operating System Container Bins / libs App App Container Bins / libs App App Hardware Operating System Container Bins / libs App App Container Bins / libs App App LXC Engine / Tooling Type 1 Hypervisor Linux Containers LXC Engine / Tooling Conceptual Mapping VM  Container Hypervisor  LXC Engine
  • 20. Docker Container Engine 6/27/2014 20 PC Mac Virtual Machine Bare Metal Docker Image Develop container image once – run anywhere the docker engine is supported.
  • 21. Typical Container Engine / Tooling Functionality  Standard container operations to realize / manage LXCs – Run, start, stop, delete, attach, snapshot, etc.  APIs – CLI, Web API (RESTful or other), Automation files (e.g. dockerfile) – Security – who can use the APIs  Images – Format (typically RAW), required dev files, etc.. – Image repository or catalog – Container FS layers (union FS style) – Import / export images and containers  Metrics – CPU, memory, block IO, etc..  Eventing – Push events on container changes  Logs – Inspect / manage container logs  Checkpointing – Freeze a container’s state 6/27/2014 21 Sample concepts below; not a complete list.
  • 22. LXC Host / Engine Portability Considerations  Linux Kernel dependencies – Non-standard kernel modules needed by app(s) on image – LXC related kernel features tied to specific versions of the Kernel  LSM profile settings; security – Different profile config per LSM (e.g. AppArmor, SELinux, etc.)  Container file system / storage type & capability – FS type – FS capabilities such as direct I/O  External storage – FS vols bind mounted into container  Network options – Linux bridging, OVS, etc…  Shared host libs / bins; dependencies from host OS 6/27/2014 22 Sample concepts below; not a complete list. Not well solidified in current open source community.
  • 23. LXC Runtime Configuration Considerations  Linux capabilities – Applied to container process for Linux security – Sysctl settings  Cgroups settings – Allowed devices, limits, UID / GID mappings, etc.  Environment variables – Exposed to shell / procs in container  Namespaces – Which to use for container procs  Networking – Type, IP, gateway, hostname, etc..  Init – Which cmd / bin to run on container init 6/27/2014 23 Example: http://tinyurl.com/nj4u3le Sample concepts below; not a complete list. Community consolidating around docker’s libcontainer project.
  • 24. LXC Industry Tooling Parallels OpenVZ Linux VServer Libvirt-lxc Lxc (tools) Warden lmctfy Docker Summary Commercial product using OpenVZ under the hood Custom Kernel (pre GNU 3.13) providing well seasoned LXC support. A set of kernel patches providing LXC. Not based on cgroups or namespaces. Libvirt support for LXC via cgroups and namespaces. Lib + set of user spaces tools /bindings for LXC. LXC management tooling used by CF. Similar to LXC, but provides more intent based focus. Commoditizatio n of LXC adding support for images, build files, etc. Part of upstream Kernel? No No Partial Yes Yes Yes Yes, but additional patches needed for specific features. Yes License Commercial GNU GPL v2 GNU GPL v2 GNU LGPL GNU LGPL Apache v2 Apache v2 Apache v2 APIs / Bindings - CLI - API - CLI - C - CLI - C - Python - Java - C# - PHP - Python - Lua - GO - CLI - GO - REST - CLI - Python - Other 3rd party libs Managem ent plane/ Dashboard Parallels Cloud Server Parallels + others - OpenStack - Archipel - Virt- Manager - LXC web panel - Lexy - OpenStack - Shipyard - Docker UI 6/27/2014 24
  • 25. LXC Orchestration & Management  Docker & libvirt-lxc in OpenStack – Manage containers heterogeneously with traditional VMs… but not w/the level of support & features we might like  CoreOS – Zero-touch admin Linux distro with docker images as the unit of operation – Centralized key/value store to coordinate distributed environment  Various other 3rd party apps – Maestro for docker – Shipyard for docker – Fleet for CoreOS – Etc.  LXC migration – Container migration via criu  But… – Still no great way to tie all virtual resources together with LXC – e.g. storage + networking • IMO; an area which needs focus for LXC to become more generally applicable 6/27/2014 25
  • 26. LXC Gaps There are gaps…  Lack of industry tooling / support  Live migration still a WIP  Full orchestration across resources (compute / storage / networking)  Fears of security  Not a well known technology… yet  Integration with existing virtualization and Cloud tooling  Not much / any industry standards  Missing skillset  Slower upstream support due to kernel dev process  Etc. 6/27/2014 26
  • 27. LXC: Use Cases For Traditional VMs There are still use cases where traditional VMs are warranted.  Virtualization of non Linux based OSs – Windows – AIX – Etc.  LXC not supported on host  VM requires unique kernel setup which is not applicable to other VMs on the host (i.e. per VM kernel config)  Etc. 6/27/2014 27
  • 29. Links & Resources  http://www.slideshare.net/BodenRussell/realizing-linux-containerslxc  http://www.slideshare.net/BodenRussell/kvm-and-docker-lxc-benchmarking-with- openstack  https://github.com/dotcloud/docker/tree/master/vendor/src/github.com/docker/ libcontainer  https://github.com/docker/libchan  https://github.com/docker/libswarm  https://help.ubuntu.com/lts/serverguide/lxc.html  http://www.docker.com/  http://www.parallels.com/  http://openvz.org/Main_Page  http://criu.org/LXC  https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt  https://lwn.net/Articles/531114/  http://www.hansenpartnership.com/OpenStackContainers2014/#/begin 6/27/2014 29