1. Inside Docker for Fedora20/RHEL7
ver1.8e Etsuji Nakai
Twitter @enakai00
Open Cloud Campus
Inside Docker
for Fedora20/RHEL7
2. Open Cloud Campus
2
Inside Docker for Fedora20/RHEL7
$ who am i
– The author of “Professional Linux Systems” series.
• Available only in Japanese (some are in Korean taranslation.)
• Translation offering from publishers are welcomed ;-)
Self-study Linux
Deploy and Manage by yourself
Professional Linux Systems
Deployment and Management
Professional Linux Systems
Network Management
Etsuji Nakai
– Senior solution architect and
cloud evangelist at Red Hat.
Professional Linux Systems
Technology for Next Decade
New OpenStack book
is in store now!
3. Open Cloud Campus
3
Inside Docker for Fedora20/RHEL7
Contents
What is Linux Container
Device Mapper Thin-Provisioning
Network Namespace
systemd and cgroups
(*) Contents of this document is based on Fedora20 with docker-io-1.0.0-1.fc20.x86_64
5. Open Cloud Campus
5
Inside Docker for Fedora20/RHEL7
Traditional server virtualization
Physical machine
Physical machine
ホスト OS
Hypervisor
(Kernel Module)
Virtual
Machine
Guest
OS
VMware vSphere, Xen, etc.
Linux KVM
Hardware assisted virtualization
(Hypervisor is embedded in firmware.)
Software assisted virtualization
(Hypervisor is installed on physical machine.)
Software assisted virtualization
(Host OS provides the hypervisor feature.)
Physical machine
OS
Baremetal
Traditional "server virtualization" is a technology to
create software emulated "virtual machines" hosting
various guest operating systems.
Hypervisor (Software)
Physical machine
Hypervisor (Firmware)
Virtual
Machine
Guest
OS
Virtual
Machine
Guest
OS
Virtual
Machine
Guest
OS
Virtual
Machine
Guest
OS
Virtual
Machine
Guest
OS
Virtual
Machine
Guest
OS
Virtual
Machine
Guest
OS
6. Open Cloud Campus
6
Inside Docker for Fedora20/RHEL7
"Linux Container" is a Linux kernel feature to contain a group of processes in an
independent execution environment called a container.
Linux kernel provides an independent apllication execution environment for each
container which includes:
– Independent filesystem.
– Independent network interface and IP address.
– Usage limit for memory and CPU time.
You can use containers on Linux virtual machines in addition to baremetal servers since
the container can co-exist with the traditional server virtualization technology.
Linux Kernel
UserProcess
・・・
Physical Machine
Physical Machine
OS
ContainerBaremetal
UserProcess
UserProcess
User Space
Linux Kernel
UserProcess
UserProcess
User Space
UserProcess
UserProcess
User Space
・・・
What is container technology?
Container
7. Open Cloud Campus
7
Inside Docker for Fedora20/RHEL7
Container supports separation of various resources. They are internally realized with
different technologies called "namespace."
– Filesystem separation → Mount namespace (kernel 2.4.19)
– Hostname separation → UTS namespace (kernel 2.6.19)
– IPC separtion → IPC namespece (kernel 2.6.19)
– User (UID/GID) separation → User namespace (kernel 2.6.23〜kernel 3.8)
– Processtable separation → PID namespace (kernel 2.6.24)
– Network separtion → Network Namepsace (kernel 2.6.24)
– Usage limit of CPU/Memory → Control groups
(*) Reference: "Namespaces in operation, part 1: namespaces overview"
• http://lwn.net/Articles/531114/
Linux container is realized by integrating these namespace features. There are multiple
container management tools such as lxctools, libvirt and docker. They may use different
parts of these features.
Under the hood
8. Open Cloud Campus
8
Inside Docker for Fedora20/RHEL7
Processes in all containers are executed on the same Linux kernel. But inside a
container, you can see processes only in the container.
– This is because each container has its own process table. On host linux, which is outside
containers, you can see all processes icnluding ones in containers.
Resource separation / Process tables
# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 09:49 ? 00:00:00 /bin/sh /usr/local/bin/init.sh
root 35 1 0 09:49 ? 00:00:00 /usr/sbin/sshd
root 47 1 0 09:49 ? 00:00:00 /usr/sbin/httpd
apache 49 47 0 09:49 ? 00:00:00 /usr/sbin/httpd
apache 50 47 0 09:49 ? 00:00:00 /usr/sbin/httpd
...
apache 56 47 0 09:49 ? 00:00:00 /usr/sbin/httpd
root 57 1 0 09:49 ? 00:00:00 /bin/bash
# ps -ef
UID PID PPID C STIME TTY TIME CMD
...
root 802 1 0 18:10 ? 00:01:20 /usr/bin/docker -d --selinux-enabled -H fd://
...
root 3687 802 0 18:49 pts/2 00:00:00 /bin/sh /usr/local/bin/init.sh
root 3736 3687 0 18:49 ? 00:00:00 /usr/sbin/sshd
root 3748 3687 0 18:49 ? 00:00:00 /usr/sbin/httpd
48 3750 3748 0 18:49 ? 00:00:00 /usr/sbin/httpd
...
48 3757 3748 0 18:49 ? 00:00:00 /usr/sbin/httpd
root 3758 3687 0 18:49 pts/2 00:00:00 /bin/bash
Processes seen inside container
Processes seen outside container
9. Open Cloud Campus
9
Inside Docker for Fedora20/RHEL7
Resource separation / Process tables (cont.)
fork/exec
sshd
PID namespace
In the example of previous page, docker daemon fork/exec-ed the initial process
"init.sh" and put it in a new "PID namespace." After that, all processes fork/exec-ed
from init.sh are put in the same namespace.
– Inside container, the initial process has PID=1 independently from the host. Likewise, child
processes of it have independent PID's.
– Since Docer1.0 doesn't support UID namespace, the same UID/GID's are used as the host even in
the container. User/group names could be different because /etc/passwd is different in the
containter.
• Reference:"Docker 1.0 and user namespaces"
https://groups.google.com/forum/#!topic/docker-dev/MoIDYDF3suY
PID=1
bash
/bin/sh /usr/local/bin/init.sh
httpd
httpd
・・・
#!/bin/sh
service sshd start
service httpd start
while [[ true ]]; do
/bin/bash
done
init.sh
docker daemon
10. Open Cloud Campus
10
Inside Docker for Fedora20/RHEL7
Resource separation / Filesystem
A specific directory on the host is bind mounted as a root directory of the container.
Inside container, that directory is seen as a root directory, very similar mechanism to
the "chroot jail."
When using traditional container management tools such as lxctools or libvirt, you need
to prepare the directory contents by hand.
– You can put minimam contants for a specific application such as application
bianaries and shared libraries in the directory.
– It's also possible to copy a whole root filesystem of a specific linux distribution to
the directory.
– If necessary, special filesystems such as /dev, /proc and /sys are mounted in the
container by the management tool.
Mount namespace
/
|--etc
|--bin
|--sbin
...
/export/container01/rootfs/
|--etc
|--bin
|--sbin
...
bind mount
11. Open Cloud Campus
11
Inside Docker for Fedora20/RHEL7
Resource separation / Filesystem (cont.)
Docker provides the original disk image management system which mounts the specified
image on the host and make it the root filesystem of the container.
# df -a
Filesystem 1K-blocks Used Available Use% Mounted on
rootfs 10190136 169036 9480428 2% /
/dev/mapper/docker-252:3-130516-d798a41bcba1dbe621bf2dd87de0f9c6dd9f9c8aadb79f84e0170
5ee82f364c6
10190136 169036 9480428 2% /
proc 0 0 0 - /proc
sysfs 0 0 0 - /sys
tmpfs 1025136 0 1025136 0% /dev
shm 65536 0 65536 0% /dev/shm
devpts 0 0 0 - /dev/pts
/dev/vda3 14226800 3013432 10467640 23% /.dockerinit
/dev/vda3 14226800 3013432 10467640 23% /etc/resolv.conf
/dev/vda3 14226800 3013432 10467640 23% /etc/hostname
/dev/vda3 14226800 3013432 10467640 23% /etc/hosts
devpts 0 0 0 - /dev/console
...
# df
Filesystem 1K-blocks Used Available Use% Mounted on
...
/dev/dm-2 10190136 169036 9480428 2%
/var/lib/docker/devicemapper/mnt/d798a41bcba1dbe621bf2dd87de0f9c6dd9f9c8aadb79f84e017
05ee82f364c6
Filesystem
seen in a container
Specified disk image
mounted on the host
Disk image mounted
on the host.
Some files are separately
bind-mounted.
12. Open Cloud Campus
12
Inside Docker for Fedora20/RHEL7
Network namespace
Resource separation / Network
Container uses Linux's "veth" device for network communication.
– veth is a pair of logical NIC devices connected through a (virtual) crossover cable.
One side of the veth pair is placed in a container's network namespace so that it can be
seen only inside the container. The other side is connected to a Linux bridge on the host.
– A device name in the container is renamed such as "eth0." By means of the namespace, network
settings such as IP address, routing table and iptables are independently configured in the
container。
– The connection between the bridge and a physical network is up to the host configuration.
Host Linux
vethXX
eth0
docker0
eth0
IP masquerade
Physical network
Docker creates a bridge "docker0" and packets from
containers are forwarded with IP masquerade.
– Packets from the physical network targeted to specified
ports are forwarded to the container using the port
forwarding feature of iptables.
172.17.42.1
13. Open Cloud Campus
13
Inside Docker for Fedora20/RHEL7
Resource separation / CPU and Memory
Processes inside container recognize all physical memory and CPU cores. But allocation
is restricted with Linux's controll groups (cgroups).
– In theory, fine grained allocation controll including number of CPU cores, CPU time quota and I/O
bandwidth is possible.
Docker uses systemd's unit mechanism to manage the group of processes in the
container.
– When creating a container, Docker asks systemd to create a new unit to start the initial process.
As a result, all processes fork/exec-ed from the initial process belong to the same unit. At the
same time, systemd creates a new cgroups' group for the unit.
# systemd-cgls
...
└─system.slice
├─docker-cc08291a81556ba55f049e50fd2c04287b04c6cf657a8a9971ef42468a2befa7.scope
│ ├─7444 nginx: master process ngin
│ ├─7458 nginx: worker proces
│ ├─7459 nginx: worker proces
│ ├─7460 nginx: worker proces
│ └─7461 nginx: worker proces
...
"docker-<Container ID>.scope" is
the cgroups' group name
15. Open Cloud Campus
15
Inside Docker for Fedora20/RHEL7
Device Mapper is a Linux's virtual filesystems mechanism to create a logical device
which provides additional features on top of physical block devices. This is done
through a wrapper of software modules. Typical moduldes are:
– dm-raid : add a software RAID feature
– dm-multipath : add a multipath access to LUN's
– dm-crypt : add an encryption feature
– dm-delay : add an access delay emulation feature
What is Device Mapper?
/dev/sda /dev/sdb
/dev/dm1
Mirroring
dm-raid
/dev/sda
/dev/dm1
dm-crypt
Encryption
/Decryption
/dev/sda
/dev/dm1
dm-delay
Access delay
16. Open Cloud Campus
16
Inside Docker for Fedora20/RHEL7
Device Mapper Thin-Provisioning (dm-thin) is a relatively new module which provides
"thin-provisioning" and "snapshot" features similar to commercial storage appliances.
dm-thin uses two block devices, one is for "block pool" and the others is for "metadata
device."
– Fixed size blocks are dynamically allocated to logical devices so that blocks are consumed only
when data are actually written.
– Pointers from segments of logical devices to blocks in the block pool are stored in the metadata
device.
– CoW (Copy on Write) snapshots are created by allowing pointing to the same block from
different logical devices. You can create multi-generation snapshots with this mecanism.
What is Device Mapper Thin-Provisioning?
Block Pool
Metadata
Device
Pointers from segments of logical devices
to block in the pool are stored.
Logical device #001 Logical device #002 Logical device #003
17. Open Cloud Campus
17
Inside Docker for Fedora20/RHEL7
On recent Linux distributions, you can use dm-thin through LVM interface as below.
– First, create a volume group as usual.
– Then, define a "thin pool". It creates LV's for block pool and metadata in the background.
Using dm-thin through LVM interface
# fallocate -l $((1024*1024*1024)) pooldev.img
# losetup -f pooldev.img
# losetup -a
/dev/loop0: [64768]:39781720 (/root/pooldev.img)
# pvcreate /dev/loop0
# vgcreate vg_data /dev/loop0
# lvcreate -L 900M -T vg_data/thinpool
Logical volume "lvol1" created
Logical volume "thinpool" created
# lvs
LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert
...
lvol0 vg_data -wi------- 4.00m
thinpool vg_data twi-a-tz-- 900.00m 0.00
LV: thinpool LV: lvol1
VG: vg_data
Block pool Metadata device
Logical device
vol00
Logical device
vol01
・・・
18. Open Cloud Campus
18
Inside Docker for Fedora20/RHEL7
– Define a new logical device specifying its logical size with -V option.
– Create a snapshot with the following command.
– Snapshots are inactive by default for the sake of data protection. You can use it after
activating with the following command.
Using dm-thin through LVM interface (cont.)
# lvcreate -V 100G -T vg_data/thinpool -n vol00
Logical volume "vol00" created
# lvs
LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert
...
lvol0 vg_data -wi------- 4.00m
thinpool vg_data twi-a-tz-- 900.00m 0.00
vol00 vg_data Vwi-a-tz-- 100.00g thinpool 0.00
# lvcreate -s --name vol01 vg_data/vol00
Logical volume "vol01" created
# lvs
LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert
...
lvol0 vg_data -wi------- 4.00m
thinpool vg_data twi-a-tz-- 900.00m 0.00
vol00 vg_data Vwi-a-tz-- 100.00g thinpool 0.00
vol01 vg_data Vwi---tz-k 100.00g thinpool vol00
# lvchange -K -ay /dev/vg_data/vol01
19. Open Cloud Campus
19
Inside Docker for Fedora20/RHEL7
Docker has a plugin mechanism for image management drivers and "Device Mapper
driver" is used in Fedora20/RHEL7. It stores each image in a logical device of "Device
Mapper Thin Provisioning (dm-thin)."
– When starting a new container, a snapshot of the specified image is attached to the container.
– When storing the image with "docker commit", it creates a new snapshot of the snapshot. You'd
better stop the container with "docker stop" before executing "docker commit."
Use of Thin Provisioning in Docker
Local image Snapshot
Create a snapshot
when starting a container.
×
run
commit
rm
Processes
Snapshot
stop
start
Local image
When a container is sopped,
all processes in it are stopped.
(The snapshot image is not deleted.)
When a container is removed,
the associated snapshot is deleted.Save a new local image by taking
a snapshot of the snapshot.
20. Open Cloud Campus
20
Inside Docker for Fedora20/RHEL7
Docker uses the native dm interface of dm-thin module instead of LVM interface.
– When a docker service is launched, it loop-mounts the following "data" and "meadata" disk image
file, and create a block pool with them.
How Docker uses Device Mapper Thin-Provisioning?
# ls -lh /var/lib/docker/devicemapper/devicemapper/
total 1.2G
-rw-------. 1 root root 100G May 11 21:37 data
-rw-------. 1 root root 2.0G May 11 22:05 metadata
# losetup
NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE
/dev/loop0 0 0 1 0 /var/lib/docker/devicemapper/devicemapper/data
/dev/loop1 0 0 1 0 /var/lib/docker/devicemapper/devicemapper/metadata
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
...
loop0 7:0 0 100G 0 loop
└─docker-252:3-130516-pool 253:0 0 100G 0 dm
loop1 7:1 0 2G 0 loop
└─docker-252:3-130516-pool 253:0 0 100G 0 dm
Block pool device
Metadata device
21. Open Cloud Campus
21
Inside Docker for Fedora20/RHEL7
Configuration data of logical devices are stored in the following JSON files.
– /var/lib/docker/devicemapper/metadata/<Image ID>
– The logical device with device ID "0" has a special role. It is created with 10GB size when Docker
service is started for the first time. Docker initializes it as an empty ext4 filesystem.
– When you downloads images from an external registory, snapshots of thie device are used to
store those images. Therefore, all logical devices have the same 10GB size and ext4 filesystem.
How Docker uses Device Mapper Thin-Provisioning? (cont.)
# docker images enakai/httpd
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
enakai/httpd ver1.0 d3d92adfcafb 36 hours ago 206.6 MB
# cat /var/lib/docker/devicemapper/metadata/d3d92adfcafb* | python -mjson.tool
{
"device_id": 72,
"initialized": false,
"size": 10737418240,
"transaction_id": 99
}
# cat /var/lib/docker/devicemapper/metadata/base | python -mjson.tool
{
"device_id": 0,
"initialized": true,
"size": 10737418240,
"transaction_id": 1
}
22. Open Cloud Campus
22
Inside Docker for Fedora20/RHEL7
As a sort of hacking technique, you can mount disk image contents by hand, using
dmsetup command to interact with dm-thin module.
– At first, using the commands in the previous page, check the "deivce_id" and "size" of the disk
image you want to mount. In addition, check the name of thin pool with the following command.
It's "docker-252:3-130516-pool" in this example.
– For the sake of simplicity, set these values in shell variables.
Manipulating image contents by hand
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
...
loop0 7:0 0 100G 0 loop
└─docker-252:3-130516-pool 253:0 0 100G 0 dm
loop1 7:1 0 2G 0 loop
└─docker-252:3-130516-pool 253:0 0 100G 0 dm
# device_id=72
# size=10737418240
# pool=docker-252:3-130516-pool
23. Open Cloud Campus
23
Inside Docker for Fedora20/RHEL7
– Activate and mount the logical device with the following command. Under "rootfs" is the root
filesystem seen from a container.
– Finally, unmount and deactivate the logical device.
(*) Modifying the contents of images is not a supported procedure of Docker. You should do it
at you own risk as it may damage the image.
– Reference: https://www.kernel.org/doc/Documentation/device-mapper/thin-provisioning.txt
Manipulating image contents by hand (cont.)
# dmsetup create myvol --table "0 $(($size / 512)) thin /dev/mapper/$pool $device_id"
# lsblk
...
loop0 7:0 0 100G 0 loop
└─docker-252:3-130516-pool 253:0 0 100G 0 dm
└─myvol 253:1 0 10G 0 dm
loop1 7:1 0 2G 0 loop
└─docker-252:3-130516-pool 253:0 0 100G 0 dm
└─myvol 253:1 0 10G 0 dm
# mount /dev/mapper/myvol /mnt
# ls /mnt
id lost+found rootfs
# cat /mnt/rootfs/var/www/html/index.html
Hello, World!
# umount /mnt
# dmsetup remove myvol
25. Open Cloud Campus
25
Inside Docker for Fedora20/RHEL7
Network namespace
Network configuration in Docker
Container's logical NIC "eth0" is connected to a Linux
bridge "docker0." Communication between container
and external network is controlled with iptables on
the host.
– Packets from a container is forwarded with IP
masquerade.
– Packets from external network to specified ports are
forwarded to a container with iptables' port forward
feature.
Host Linux
vethXX
eth0
docker0
eth0
IP Masquerade
172.17.42.1
As an example, starting a container with portforwarding from 8000 to 80, and from 2222 to 22.
– The one end of a veth pair is connected to the bridge "docker0."
# docker run -itd -p 8000:80 -p 2222:22 enakai/httpd:ver1.0
a7838c84cd008161086839379e4a0be2d0e109e02c779229cde49f53b79ae1d5
# brctl show
bridge name bridge id STP enabled interfaces
docker0 8000.56847afe9799 no veth66c0
# ifconfig docker0
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.17.42.1 netmask 255.255.0.0 broadcast 0.0.0.0
...
26. Open Cloud Campus
26
Inside Docker for Fedora20/RHEL7
Network configuration in Docker (cont.)
– nat table of iptables is configured as below.
① Packets from an external network are processed in DOCKER chain for port forwarding.
② Packets from localhost to localhost's IP address (except "127.0.0.0/8") are processed in
DOCKER chain, too.
③ Packets from a container to an external network are forwarded with IP masquerade.
④⑤ Portforwading configuration specified with "docker run".
– I'm not sure why "127.0.0.0/8" is excluded in ②. But anyway, packets to "127.0.0.0/8" are
processed appropriately because... (see next page.)
# iptables-save
# Generated by iptables-save v1.4.19.1 on Fri Jun 13 22:36:14 2014
*nat
...
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -d 172.17.0.0/16 -j MASQUERADE
-A DOCKER ! -i docker0 -p tcp -m tcp --dport 2222 -j DNAT --to-destination 172.17.0.23:22
-A DOCKER ! -i docker0 -p tcp -m tcp --dport 8000 -j DNAT --to-destination 172.17.0.23:80
COMMIT
①
②
③
④
⑤
27. Open Cloud Campus
27
Inside Docker for Fedora20/RHEL7
Network configuration in Docker (cont.)
– Docker daemon provides the port forward proxy feature, and packets which are not processed
with iptables are handled with this.
– Originally, the feature is prepared for hosts without iptables. I'm not sure why packets to
"127.0.0.0/8" are selectively handled with this.
# lsof -i -P
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
...
docker 20003 root 11u IPv6 177010 0t0 TCP *:2222 (LISTEN)
docker 20003 root 12u IPv6 178468 0t0 TCP *:8000 (LISTEN)
...
28. Open Cloud Campus
28
Inside Docker for Fedora20/RHEL7
Network namespace manipulation
As a sort of hacking technique, you can directly manipulate network namespaces.
Without Docker, you would use network namespaces in the following steps.
– Define a new namespace.
– Add network configuration in the namespace such as logical NIC, IP address, routing table and
iptables.
– Launch processes in the namespace.
You can use "ip netns" command to manipulate network namespaces. But you need some
additional operations to manipulate network namespaces created by Docker.
– Find a PID of one of the processes in the container.
– There is a sysmlink to the descripter to manipulate the namespace in /proc filesystem of this
process.
# systemd-cgls
...
└─system.slice
├─docker-61151db106a7fd6d5cf937a03eac0e9b33c7799d3d48b6cddc83070839afeea9.scop
│ ├─502 /bin/sh /usr/local/bin/init.sh
│ ├─545 /usr/sbin/sshd
│ ├─557 /usr/sbin/httpd
...
# ls -l /proc/502/ns/net
lrwxrwxrwx 1 root root 0 June 13 22:52 /proc/502/ns/net -> net:[4026532255]
29. Open Cloud Campus
29
Inside Docker for Fedora20/RHEL7
Network namespace manipulation (cont.)
– By creating a symlink under /var/run/netns/ to the descriptor, ip command recognizes the
namespace.
– From this point, you can execute any commands inside the namespace "foo-ns."
– For example, by starting bash inside the namespace, you can see the network configuration in
the container. But configurations other than network is the same as host since you switched
only the network namespace.
# mkdir /var/run/netns
# ln -s /proc/502/ns/net /var/run/netns/foo-ns
# ip netns
foo-ns
# ip netns exec foo-ns <command>
# ip netns exec foo-ns bash
# ifconfig eth0
eth0: flags=67<UP,BROADCAST,RUNNING> mtu 1500
inet 172.17.0.2 netmask 255.255.0.0 broadcast 0.0.0.0
...
# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.17.42.1 0.0.0.0 UG 0 0 0 eth0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
# exit
# ip netns exec foo-ns <command>
30. Open Cloud Campus
30
Inside Docker for Fedora20/RHEL7
Adding more logical NIC's
With the hacking technique of "ip netns", you can add logical NIC's after starting a new
container. The following is an example of adding a logical NIC which connects to the
physical network through a bridge "br0." (This is not a supported operation of Docker.)
– Create a bridge "br0" and move the IP address (192.168.200.20/24 in this case) of physical NIC
to the bridge.
# brctl addbr br0; ip link set br0 up
# ip addr del 192.168.200.20/24 dev eth0; ip addr add 192.168.200.20/24 broadcast
192.168.200.255 dev br0; brctl addif br0 eth0; route add default gw 192.168.200.1
# echo 'NM_CONTROLLED="no"' >> /etc/sysconfig/network-scripts/ifcfg-eth0
# systemctl enable network.service
Host Linux
vethXX
eth0
Container
docker0
IP Masquerade
External network
vethYY
eth1
br0
192.168.200.99
192.168.200.20
192.168.200.20
eth0
(*) You should understand what you're doing with these
commands. It may disable the network connection if you
made a mistake.
31. Open Cloud Campus
31
Inside Docker for Fedora20/RHEL7
Adding more logical NIC's (cont.)
– Create a veth pair "veth-host / veth-guest", and attach "veth-host" to the bridge br0.
# ip link add name veth-host type veth peer name veth-guest
# ip link set veth-guest down
# brctl addif br0 veth-host
# brctl show br0
bridge name bridge id STP enabled interfaces
br0 8000.525400677470 no eth0
veth-host
Host Linux
vethXX
eth0
Container
docker0
IP Masquerade
External network
veth-host
veth-guest
br0
eth0
• At this point, both veth-host and veth-guest are visible
on the host, not in the container.
32. Open Cloud Campus
32
Inside Docker for Fedora20/RHEL7
Adding more logical NIC's (cont.)
– Add veth-guest to the container's namespace. At this point, veth-guest becomes invisible on the
host.
– From this point, you can use "ip netns exec" to make additional network configurations in the
container. The following is to rename the logical NIC to "eth0" and add an IP address. In addition,
modifying routing table to make eth1 as a default gateway.
# ip link set veth-guest netns foo-ns
# ifconfig veth-guest
veth-guest: error fetching interface information: Device not found
# ip netns exec foo-ns ip link set veth-guest name eth1
# ip netns exec foo-ns ip addr add 192.168.200.99/24 dev eth1
# ip netns exec foo-ns ip link set eth1 up
# ip netns exec foo-ns ip route delete default
# ip netns exec foo-ns ip route add default via 192.168.200.1
33. Open Cloud Campus
33
Inside Docker for Fedora20/RHEL7
Adding more logical NIC's (cont.)
– Login to the container and check the network configuration inside container.
– Now you can directly access the container without port forwarding.
– You can remove the symlink in /var/run/netns once you finished the configuration.
By the way, there is a shell script to automate this procedure....
– jpetazzo/pipework
– https://github.com/jpetazzo/pipework
# ssh enakai@localhost -p 2222
$ ifconfig eth1
eth1 Link encap:Ethernet HWaddr BE:53:16:06:BF:3A
inet addr:192.168.200.99 Bcast:0.0.0.0 Mask:255.255.255.0
...
$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.200.1 0.0.0.0 UG 0 0 0 eth1
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
192.168.200.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
$ curl http://192.168.200.99:80
Hello, World!
# rm /var/run/netns/foo-ns
35. Open Cloud Campus
35
Inside Docker for Fedora20/RHEL7
Basics of systemd and cgroups
Refer to the following slides for systemd basics.
– Your first dive into systemd
• http://www.slideshare.net/enakai/systemd-study-v14e
Especially, you need to understand how systemd manages cgroups in conjunction with
units.
– systemd defines various "units" corresponding to services and daemons.
– When systemd starts a service as a unit, it dynamically creates cgroups' group for that unit. All
processes of the service is place under this group.
– If You specify "CPUShares" and "MemoryLimit" in the unit's configuration file, they are
translated to the corresponding cgroups settings. (CPUShares specifies relative weight of CPU
time allocation, and "MemoryLimit" specifies the upper limit of memory usage.)
36. Open Cloud Campus
36
Inside Docker for Fedora20/RHEL7
Basics of systemd and cgroups (cont.)
You can check the cgroups status managed by systemd with the following command.
# systemd-cgls
├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 23
├─user.slice
│ └─user-0.slice
│ ├─session-1.scope
│ │ ├─439 sshd: root@pts/0
│ │ ├─444 -bash
│ │ ├─464 systemd-cgls
│ │ └─465 systemd-cgls
│ └─user@0.service
│ ├─441 /usr/lib/systemd/systemd --user
│ └─442 (sd-pam)
└─system.slice
├─polkit.service
│ └─352 /usr/lib/polkit-1/polkitd --no-debug
├─auditd.service
│ └─301 /sbin/auditd -n
├─systemd-udevd.service
│ └─248 /usr/lib/systemd/systemd-udevd
...
37. Open Cloud Campus
37
Inside Docker for Fedora20/RHEL7
How Docker works with systemd?
When starting a container, Docker asks systemd to create a new unit to start the initial
process.
– As a result, all processes fork/exec-ed from the initial process belong to the same unit and
placed under the same cgroups' group. The unit name is "docker-<container ID>.scope".
# docker run -td -p 8000:80 -p 2222:22 enakai/httpd:ver1.0
# systemd-cgls -a
...
└─system.slice
├─var-lib-docker-devicemapper-mnt-a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a4
7f7c4bc8b37e3b488b.mount
├─docker-a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a47f7c4bc8b37e3b488b.scope
│ ├─496 /bin/sh /usr/local/bin/init.sh
│ ├─538 /usr/sbin/sshd
│ ├─550 /usr/sbin/httpd
│ ├─552 /bin/bash
│ ├─553 /usr/sbin/httpd
│ ├─554 /usr/sbin/httpd
│ ├─555 /usr/sbin/httpd
│ ├─556 /usr/sbin/httpd
│ ├─557 /usr/sbin/httpd
│ ├─558 /usr/sbin/httpd
│ ├─559 /usr/sbin/httpd
│ └─560 /usr/sbin/httpd
...
38. Open Cloud Campus
38
Inside Docker for Fedora20/RHEL7
How Docker works with systemd?
– You can check the unit status corresponding to a container.
# unitname=docker-a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a47f7c4bc8b37e3b488b.scope
# systemctl status $unitname
docker-a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a47f7c4bc8b37e3b488b.scope - docker
container a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a47f7c4bc8b37e3b488b
Loaded: loaded (/run/systemd/system/docker-
a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a47f7c4bc8b37e3b488b.scope; static)
Drop-In: /run/systemd/system/docker-
a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a47f7c4bc8b37e3b488b.scope.d
└─90-BlockIOAccounting.conf, 90-CPUAccounting.conf, 90-Description.conf, 90-
MemoryAccounting.conf, 90-Slice.conf
Active: active (running) since 金 2014-06-13 23:05:27 JST; 1min 41s ago
CGroup: /system.slice/docker-
a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a47f7c4bc8b37e3b488b.scope
├─496 /bin/sh /usr/local/bin/init.sh
├─538 /usr/sbin/sshd
├─550 /usr/sbin/httpd
├─552 /bin/bash
├─553 /usr/sbin/httpd
├─554 /usr/sbin/httpd
├─555 /usr/sbin/httpd
...
└─560 /usr/sbin/httpd
6月 13 23:05:27 fedora20 systemd[1]: Started docker container
a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a...488b.
Hint: Some lines were ellipsized, use -l to show in full.
39. Open Cloud Campus
39
Inside Docker for Fedora20/RHEL7
How Docker works with systemd? (cont.)
– There are "-c" and "-m" options for "docker run" command. They are translated to the unit's
configuration parameter "CPUShares" and "MemoryLimit".
– After starting a container, you can change these parameters through systemd's interface.
Systemd will be more integrated with cgroups in the future. After that, additional
resource control (CPU pinning, CPU quota, I/O bandwidth) may be added to Docker.
# systemctl show $unitname | grep -E "(CPUShares=|MemoryLimit=)"
CPUShares=1024
MemoryLimit=18446744073709551615
# systemctl set-property $unitname CPUShares=512 --runtime
# systemctl show $unitname | grep -E "(CPUShares=|MemoryLimit=)"
CPUShares=512
MemoryLimit=18446744073709551615
40. Inside Docker for Fedora20/RHEL7
Etsuji Nakai
Twitter @enakai00
Open Cloud Campus
Let's learn the up-to-date
technology with Fedora/RHEL