Contenu connexe Similaire à コンテナ仮想、その裏側 〜user namespaceとrootlessコンテナ〜 (20) Plus de Retrieva inc. (18) コンテナ仮想、その裏側 〜user namespaceとrootlessコンテナ〜3. : rootless
• root root
• Docker docker group
docker group ≒ root rootless rootfull
• rootless
•
• e.g. CVE-2014-9357: (Docker)
• root
4. rootless : podman
• RHEL8
• Docker
• Podman docker
• root daemon Docker
RedHat
• root
• docker
6. •
• Linux Namespace cgroups
(+ CoW secomp etc……)
• Linux Namespace pid ( )OS
ID ( )
•
• root
8. :
• mnt : (2.4.19 )
• ipc : (2.6.19 )
• uts : (2.6.19 )
• net : (2.6.24 )
• pid : ID (2.6.24 )
• user : uid/gid capability (2.6.23 )
• 3.8
9. : mnt
•
• /tmp
• pivot_root
• /proc
• clone(2) CLONE_NEW* (2.4.19)
CLONE_NEWNS
13. : pid
• id
• pid pid
• pid
• /proc mnt /proc
• ps(1) /proc pid
14. user new!!
•
• uid
• → uid=0 (root)
• Linux 3.8 User Namespace
• clone(2) CLONE_NEWUSER 2.6.23 clone(2)
3.5 3.8
• RHEL RHEL7.3(Kernel 3.10.0) User Namespace
• RHEL7.4 sysctl RHEL8
15. user
•
•
• uid=0 ( )
• e.g. (uid=0) / /
SUID / CLONE_FS chroot so / mount propagation
/ audit log( ) etc
• RHEL Fedora Project
17. :
• RHEL7/Centos7 (7.4 ) (RHEL8 / Ubuntu )
• sudo sysctl user.max_user_namespaces=31194
• user 7 0
•
• sudo useradd -m -U -u 2001 alice
• sudo useradd -m -U -u 2002 bob
• sudo useradd -m -U -u 2003 -G wheel charlotte; sudo passwd charlotte
18. : unshare -U
• unshare(1) -U user
• root
• 65534(nobody)
• sysctl kernel.overflowuid
(kernel.overflowgid)
• uid/gid
• nobdy
[alice@rutledge ~]$ id # alice
uid=2001(alice) gid=2001(alice)
groups=2001(alice) ...
[alice@rutledge ~]$ readlink /proc/$$/ns/user
user:[4026531837]
[alice@rutledge ~]$ unshare -U # sudo
[nobody@rutledge ~]$ id
uid=65534(nobody) gid=65534(nobody)
groups=65534(nobody) ...
[nobody@rutledge ~]$ readlink /proc/$$/ns/user
user:[4026532602]
[nobody@rutledge ~]$ sysctl kernel.overflowuid
kernel.overflowuid = 65534
[nobody@rutledge ~]$ ls -ld /home/* /root/
drwx------. 2 nobody nobody 99 Apr 15 18:36 /
home/alice
drwx------. 2 nobody nobody 62 Apr 15 18:11 /
home/bob
drwx------. 2 nobody nobody 83 Apr 15 18:32 /
home/charlotte
dr-xr-x---. 2 nobody nobody 114 Apr 12 18:55 /
root/
19. : nobody
•
• /home/alice
• /home/bob
• → nobody
• Alice
• user alice
• → Alice
• user alice
• nobody
[nobody@rutledge~]$ touch /home/alice/file
[nobody@rutledge ~]$ touch /home/bob/file
touch: cannot touch '/home/bob/file':
Permission denied
[nobody@rutledge ~]$ ls -l /home/alice/file
-rw-rw-r--. 1 nobody nobody 0 Apr 15 18:40 /
home/alice/file
[nobody@rutledge ~]$ ls -l /home/bob/
ls: cannot open directory '/home/bob/':
Permission denied
[nobody@rutledge ~]$ exit #
logout
[alice@rutledge ~]$ ls -l /home/alice/file
-rw-rw-r--. 1 alice alice 0 Apr 15 18:40 /home/
alice/file
20. : alice nobody
• /proc/${PID}/uid_map user
• ( uid) ( uid) ( )
•
• (5 )
•
•
•
• uid
• uid
[alice@rutledge ~]$ unshare -U
[nobody@rutledge ~]$ id
uid=65534(nobody) gid=65534(nobody)
groups=65534(nobody) ...
[nobody@rutledge ~]$ echo $$
2392
--- ---
[alice@rutledge ~]$ echo "0 2002 1" > /proc/2392/
uid_map
-bash: echo: write error: Operation not permitted
[alice@rutledge ~]$ echo "0 2001 2" > /proc/2392/
uid_map
-bash: echo: write error: Operation not permitted
[alice@rutledge ~]$ echo "0 2001 1" > /proc/2392/
uid_map
[alice@rutledge ~]$ echo "0 2001 1" > /proc/2392/
uid_map
-bash: echo: write error: Operation not permitted
--- ---
[nobody@rutledge ~]$ id
uid=0(root) gid=65534(nobody)
groups=65534(nobody) ...
21. : root
• uid=0 2001(alice)
• alice uid=0(root)
• /home/bob /root (
)alice
nobody( )
• unshare -r
• sudo root
[nobody@rutledge ~]$ id
uid=0(root) gid=65534(nobody)
groups=65534(nobody) ...
[nobody@rutledge ~]$ ls -ld /home/* /home/
drwxr-xr-x. 5 nobody nobody 47 Apr 15 18:21 /
home/
drwx------. 2 root nobody 111 Apr 15 18:40 /
home/alice
drwx------. 2 nobody nobody 62 Apr 15 18:11 /
home/bob
drwx------. 2 nobody nobody 83 Apr 15 18:32 /
home/charlotte
22. : root
• root
• /etc/shadow
• bob home
•
•
•
• poweroff
• root 🤔
[root@rutledge ~]# cat /etc/shadow
cat: /etc/shadow: Permission denied
[root@rutledge ~]# touch /home/bob/file
touch: cannot touch '/home/bob/file':
Permission denied
[root@rutledge ~]# pkill NetworkManager
pkill: killing pid 969 failed: Operation not
permitted
[root@rutledge ~]# ip link add type veth
RTNETLINK answers: Operation not permitted
[root@rutledge ~]# mount -t tmpfs tmpfs /bin/
mount: /usr/bin: permission denied.
[root@rutledge ~]# umount /boot
umount: /boot: must be superuser to unmount.
[root@rutledge ~]# poweroff
Failed to connect to bus: Operation not
permitted
Failed to open initctl fifo: Permission denied
Failed to talk to init daemon.
23. : root
• user alice
•
• user root
• chroot
• -U unshare
•
[root@rutledge ~]# chroot /
[root@rutledge /]# unshare --pid --fork --
mount-proc
[root@rutledge /]# ps -el --forest
F S UID PID PPID C PRI NI ADDR SZ
WCHAN TTY TIME CMD
4 S 0 1 0 0 80 0 - 7337 -
pts/1 00:00:00 bash
0 R 0 24 1 0 80 0 - 11184 -
pts/1 00:00:00 ps
24. :
• user user
• mnt mount
• net
• pid
• user
root
• ok (user
)
• user
[root@rutledge /]# unshare --mount --net --pid
--fork --mount-proc
[root@rutledge /]# mount -t tmpfs tmp /tmp/
[root@rutledge /]# findmnt /tmp
TARGET SOURCE FSTYPE OPTIONS
/tmp tmp tmpfs
rw,relatime,seclabel,uid=2001,gid=2001
[root@rutledge /]# ip link add type veth
[root@rutledge /]# ip a
1: lo: <LOOPBACK> ...
link/loopback 00:00:00:00:00:00 brd
00:00:00:00:00:00
2: veth0@veth1: <BROADCAST,MULTICAST,M-
DOWN> ...
link/ether 22:43:f8:f3:10:60 brd
ff:ff:ff:ff:ff:ff
3: veth1@veth0: <BROADCAST,MULTICAST,M-
DOWN> ...
link/ether e2:d0:8b:dd:19:b0 brd
ff:ff:ff:ff:ff:ff
25. :
•
chroot/pivot_root
1.
2. user + mount
3. pivot_root
bind
4. oldroot
5. pivot_root
6. oldroot exec chroot
7. oldroot lazy umount
•
--- yum charlotte
alice ---
[alice@rutledge ~]$ su - charlotte
[charlotte@rutledge ~]$ sudo yum install -y --
installroot=/home/alice/wonderland --releasever=8 @core
iproute
[charlotte@rutledge ~]$ sudo chown -R alice: /home/
alice/wonderland
--- alice---
[alice@rutledge ~]$ unshare -Ur -n -m -pf
[root@rutledge ~]# mkdir -p under_ground
[root@rutledge ~]# mount -o bind wonderland under_ground
[root@rutledge ~]# mkdir -p under_ground/.oldroot
[root@rutledge ~]# cd under_ground
[root@rutledge under_ground]# pivot_root . .oldroot
[root@rutledge under_ground]# exec chroot . /bin/bash -l
[root@rutledge /]# mount -t proc proc /proc
[root@rutledge /]# umount --lazy .oldroot
[root@rutledge /]# findmnt
TARGET SOURCE FSTYPE
OPTIONS
/ /dev/mapper/rhel-home[/alice/wonderland] xfs
rw,relatime,seclabel,attr2,inode64,noquota
└─/proc proc proc
rw,relatime
26. :
• ……
• su
• →uid_map 1
• net
• →net veth
NIC net
root
• bind overlayfs
• CoW
• → overlayfs (Kernel
)user
[root@rutledge /]# useradd jack
Setting mailbox file permissions: Invalid
argument
[root@rutledge /]# su - jack
su: cannot set groups: Operation not permitted
[root@rutledge /]# ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop ...
link/loopback 00:00:00:00:00:00 brd
00:00:00:00:00:00
[root@rutledge /]# mkdir -p upper work newroot
[root@rutledge /]# mount -t overlay -o
lowerdir=/,upperdir=upper,workdir=work overlay
newroot
mount: /mnt: permission denied.
27. podman
•
• podman (on RHEL8)
• podman yum dnf
• centos7 sleep inf
• Docker podman exec
• sudo (rootless!!)
[alice@rutledge ~]$ podman run -d
centos:centos7 sleep inf
1209...7e74
[alice@rutledge ~]$ podman exec -lit /bin/bash
[root@1209b4cedd82 /]# ps aux --forest
USER PID %CPU %MEM VSZ RSS TTY
STAT START TIME COMMAND
root 6 1.0 0.3 11832 2972 pts/0
Ss 10:23 0:00 /bin/bash
root 19 0.0 0.4 51748 3392 pts/0
R+ 10:23 0:00 _ ps aux --forest
root 1 0.0 0.0 4372 664 ?
Ss 10:22 0:00 sleep inf
28. 1: podman uid_map
• podman
• uid_map 0 2001 1
1 100000 65536 ……
• jack uid=1000
uid=1000999 ……
•
• root
uid_map uid
• 1000000
• 🤔
[alice@rutledge ~]$ podman exec -lit /bin/bash
[root@1209b4cedd82 /]# useradd jack
[root@1209b4cedd82 /]# su -c id jack
uid=1000(jack) gid=1000(jack) groups=1000(jack)
[root@1209b4cedd82 /]# cat /proc/1/uid_map
0 2001 1
1 100000 65536
29. newuidmap(1) / newgidmap(1)
• shadow-utils
• /proc/${pid}/uid_map(gid_map)
•
• SUID
=root
uid
• /etc/
subuid(subgid)
• useradd
•
• SUID rootless ……
[alice@rutledge ~]$ cat /etc/subuid
alice:100000:65536
bob:165536:65536
charlotte:231072:65536
[alice@rutledge ~]$ cat /etc/subgid
alice:100000:65536
bob:165536:65536
charlotte:231072:65536
[alice@rutledge ~]$ unshare -U sleep inf &
[1] 7126
[alice@rutledge ~]$ newuidmap $! 0 2002 1
newuidmap: uid range [0-1) -> [2002-2003) not
allowed
[alice@rutledge ~]$ newuidmap $! 0 $(id -u) 1 1
100000 65536
[alice@rutledge ~]$ newgidmap $! 0 $(id -g) 1 1
100000 65536
[alice@rutledge ~]$ cat /proc/$!/uid_map
0 2001 1
1 100000 65536
[alice@rutledge ~]$ cat /proc/$!/gid_map
0 2001 1
1 100000 65536
31. 2: podman
• podman ( )
• tap0
• grep
slirp4netns
•
tap0
• → TUN/TAP
[alice@rutledge ~]$ podman exec -lit /bin/bash
[root@1209b4cedd82 /]# curl -I 'https://
retrieva.jp/'
HTTP/1.1 200 OK
:
[root@1209b4cedd82 /]# yum install -y iproute
[root@934bf6e4252b /]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc
noqueue ...
:
2: tap0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc
fq_codel ...
:
[root@934bf6e4252b /]# exit
[alice@rutledge ~]$ ps aux | grep tap0
alice 11881 0.0 0.2 4592 1856 pts/0
S 19:22 0:00 /usr/bin/slirp4netns -c -e 3 -
r 4 11870 tap0
[alice@rutledge ~]$ kill 11870
[alice@rutledge ~]$ podman exec -it $(podman ps
-ql) ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc
noqueue ...
:
32. slirp4netns: slirp
• slirp SLIP
(Serial Line Internet Protocol)
• SLIP PPP
•
net
slirp4netns
• QEMU
• IP
• default route: 10.0.2.2/24
• DNS forward: 10.0.2.3
• DHCP addresses: 10.0.2.15 - 10.0.2.31
[alice@rutledge ~]$ podman exec -lit /bin/bash
[root@934bf6e4252b /]# curl 'https://retrieva.jp/' -I
HTTP/1.1 200 OK
:
[root@a041f01d3221 /]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP>...
link/loopback 00:00:00:00:00:00 brd ...
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: tap0: <BROADCAST,UP,LOWER_UP>...
link/ether 0e:3c:3c:65:d9:82 brd ...
inet 10.0.2.100/24 brd 10.0.2.255 scope global
tap0
valid_lft forever preferred_lft forever
inet6 fe80::c3c:3cff:fe65:d982/64 scope link
valid_lft forever preferred_lft forever
[root@a041f01d3221 /]# ip route
default via 10.0.2.2 dev tap0
10.0.2.0/24 dev tap0 proto kernel scope link src
10.0.2.100
[root@934bf6e4252b /]# exit
33. slirp4netns: slirp netns
•
root
• net
• SUID
• RHEL8 slirp
listen
• slirp4netns-0.1-2 bind
[alice@rutledge ~]$ ls -l $(which slirp4netns)
-rwxr-xr-x. 1 root root 76264 8 11 2018 /
usr/bin/slirp4netns
[alice@rutledge ~]$ podman run -p 10080:80
centos:centos7
port bindings are not yet supported by rootless
containers
[alice@rutledge ~]$ rpm -q slirp4netns
slirp4netns-0.1-1.dev.gitc4e1bc5.el8+1463+3d8a3
dce.x86_64
34. 3: CoW
• OS
• CoW(Copy-on-Write)
+
• Docker dm-thin overlayfs
root
• podman info
• GraphDriverName vfs
• GraphRoot ~/.local/ storage
• RunRoot /run/user/${UID}/run
• RunRoot bind
(hosts resolve.conf ) GraphRoot
• vfs-layers/mountpoints.json
[alice@rutledge ~]$ podman info
:
store:
ContainerStore:
number: 1
GraphDriverName: vfs
GraphOptions: []
GraphRoot: /home/alice/.local/share/
containers/storage
GraphStatus: {}
ImageStore:
number: 1
RunRoot: /run/user/2001/run
[alice@rutledge ~]$ find /run/user/2001/run
:
/run/user/2001/run/vfs-containers/d1ab...eefd
/run/user/2001/run/vfs-containers/d1ab...eefd/
userdata
:
/run/user/2001/run/vfs-layers
/run/user/2001/run/vfs-layers/mountpoints.json
35. podman (vfs)
• mountpoints.json
•
• jack
10999
(uid_map)
•
[alice@rutledge ~]$ jq '.[].path' /run/user/
2001/run/vfs-layers/mountpoints.json
"/home/alice/.local/share/containers/storage/
vfs/dir/aeaa...458a"
[alice@rutledge ~]$ ll /home/alice/.local/
share/containers/storage/vfs/dir/aeaa...458a/
total 16
-rw-r--r--. 1 alice alice 12082 Mar 6 02:36
anaconda-post.log
lrwxrwxrwx. 1 alice alice 7 Mar 6 02:34
bin -> usr/bin
drwxr-xr-x. 2 alice alice 6 Mar 6 02:34
dev
[alice@rutledge ~]$ ll /home/alice/.local/
share/containers/storage/vfs/dir/aeaa...7458a/
home/
total 0
drwx------. 2 100999 100999 62 Apr 15 21:19
jack
36. podman(vfs)
• centos:centos7
210M
• 10
210M*10=2G
• CoW
• → 2G
• CoW
[alice@rutledge ~]$ du -sh .local/share/
containers/storage/vfs/dir/aeaa...7458a/
210M .local/share/containers/storage/vfs/
dir/aeaa...458a/
[alice@rutledge ~]$ df -h .local/share/
containers/storage/
Filesystem Size Used Avail Use%
Mounted on
/dev/mapper/rhel-home 20G 4.2G 16G 21% /
home
[alice@rutledge ~]$ seq 10 | xargs -I{} podman
run -d centos:centos7 sleep inf
[alice@rutledge ~]$ df -h .local/share/
containers/storage/
Filesystem Size Used Avail Use%
Mounted on
/dev/mapper/rhel-home 20G 6.3G 14G 32% /
home
37. fuse-overlayfs(1)
• vfs
• fuse-overlayfs user
overlayfs
• ~/.config/containers/storage.conf
• storage.driver="overlay"
• storage_options.mount_program="/usr/
bin/fuse-overlayfs"
•
podman storage
• vfs XFS reflink
shallow copy/CoW
• orz
[alice@rutledge ~]$ podman rm -f --all
[alice@rutledge ~]$ podman rmi -f --all
[alice@rutledge ~]$ su -c 'rm /home/
alice/.local/' charlotte #
[alice@rutledge ~]$ mkdir -p .config/
containers/
[alice@rutledge ~]$ cat .config/containers/
storage.conf
[storage]
driver = "overlay"
[storage.options]
mount_program = "/usr/bin/fuse-overlayfs"
38. podman with fuse-overlayfs
• / fuse-overlayfs
• ~/.local/share/
containers/storage/*/ overlayfs
• diff: CoW
• work: overlayfs
• merged: overlayfs
•
• → mnt
[alice@rutledge ~]$ podman run -d centos:centos7
sleep inf
[alice@rutledge ~]$ podman exec -l findmnt /
TARGET SOURCE FSTYPE OPTIONS
/ fuse-overlayfs fuse.fuse-overlayfs
rw,nosuid,nodev,relatime,user_id=0,group_id=0,def
ault_permissions,allow_other
[alice@rutledge ~]$ ll /home/alice/.local/share/
containers/storage/overlay/*
/home/alice/.local/share/containers/storage/
overlay/
2bbb2f38cf08544b67e60954e9da373c67f2d5658a7e6a074
afc5818c9805ebe:
8
drwxr-xr-x. 4 alice alice 28 4 16 23:13 diff
-rw-r--r--. 1 alice alice 26 4 16 23:13 link
-rw-rw-r--. 1 alice alice 28 4 16 23:13 lower
drwx------. 2 alice alice 6 4 16 23:13 merged
drwx------. 3 alice alice 18 4 16 23:13 work
:
39. rootless
• su (uid_map )
• newuidmap(1) / newgidmap(1) (SUID )
• net (veth )
• slirp4netns !
• (bind )
• bind overlayfs (CoW )
• fuse-overlayfs nserns
• XFS reflink
40. : rootless
1.
2. user + mnt + net
3. [NEW] newuidmap(1) / newgidmap(1)
4. [UPDATE] pivot_root
bind fuse-overlayfs
5. oldroot
6. [NEW] fuse-overlayfs
pivot_root
mnt
•
• dev/ console tty bind
mount sys/ proc/
7. pivot_root
8. oldroot exec
chroot
9. oldroot lazy umount
10.[NEW] slirp4userns
11.[NEW] ip route
41. • Rootless
• https://www.slideshare.net/AkihiroSuda/rootless
• Namespaces in operation, part 1: namespaces overview [LWN.net]
• https://lwn.net/Articles/531114/
• Namespaces in operation, part 5: User namespaces [LWN.net]
• https://lwn.net/Articles/532593/
• Filesystem mounts in user namespaces [LWN.net]
• https://lwn.net/Articles/652468/
• Anatomy of a user namespaces vulnerability [LWN.net]
• https://lwn.net/Articles/543273/
• Man page of USER_NAMESPACES
• https://linuxjm.osdn.jp/html/LDP_man-pages/man7/
user_namespaces.7.html
• util-linux/unshare.c at master · karelzak/util-linux
• https://github.com/karelzak/util-linux/blob/master/sys-utils/
unshare.c
• shadow/newuidmap.c at master · shadow-maint/shadow
• https://github.com/shadow-maint/shadow/blob/master/src/
newuidmap.c
• hnakamur’s blog: QEMU Wiki Slirp Tap
• http://hnakamur.blogspot.com/2009/08/qemu-wikislirptap.html
• slirp4netns/main.c at master · rootless-containers/slirp4netns
• https://github.com/rootless-containers/slirp4netns/blob/master/
main.c
• Working with the Container Storage library and tools in Red Hat
Enterprise Linux
• https://www.redhat.com/en/blog/working-container-storage-
library-and-tools-red-hat-enterprise-linux
• The State of Rootless Containers
• https://www.slideshare.net/AkihiroSuda/the-state-of-rootless-
containers