SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
rootless
User namespace
•
•
•
•
•
• user
• user
• podman
• podman uid_man
• podman
• podman
• rootless
•
: rootless
• root root
• Docker docker group 

docker group ≒ root rootless rootfull
• rootless
•
• e.g. CVE-2014-9357: (Docker)
• root
rootless : podman
• RHEL8
• Docker
• Podman docker
• root daemon Docker
RedHat
• root
• docker
• RHEL8 podman rootless
•
•
• Retrieva Tech Blog
• [🔍 TECH Blog]
•
•
• Linux Namespace cgroups
(+ CoW secomp etc……)
• Linux Namespace pid ( )OS
ID ( )
•
• root
:
•
• /proc/${PID}/ns/
• fork
• clone(2) unshare(2)
• setns(2)
• /proc/${PID}/ns/ fd
:
• mnt : (2.4.19 )
• ipc : (2.6.19 )
• uts : (2.6.19 )
• net : (2.6.24 )
• pid : ID (2.6.24 )
• user : uid/gid capability (2.6.23 )
• 3.8
: mnt
•
• /tmp
• pivot_root
• /proc
• clone(2) CLONE_NEW* (2.4.19)
CLONE_NEWNS
: ipc
• (InterProcess Communication)
•
• PIPE IPC
• /proc/sys/fs/mqueue
: uts
•
•
• /etc/hosts
mnt
: net
•
•
• net
•
• veth( )
• ip(1)
• /proc/${PID}/ns/ bind
: pid
• id
• pid pid
• pid
• /proc mnt /proc
• ps(1) /proc pid
user new!!
•
• uid
• → uid=0 (root)
• Linux 3.8 User Namespace
• clone(2) CLONE_NEWUSER 2.6.23 clone(2)
3.5 3.8
• RHEL RHEL7.3(Kernel 3.10.0) User Namespace
• RHEL7.4 sysctl RHEL8
user
•
•
• uid=0 ( )
• e.g. (uid=0) / /
SUID / CLONE_FS chroot so / mount propagation
/ audit log( ) etc
• RHEL Fedora Project
User Namespace
•
• root
• etc
• =
• User Namespace
:
• RHEL7/Centos7 (7.4 ) (RHEL8 / Ubuntu )
• sudo sysctl user.max_user_namespaces=31194
• user 7 0
•
• sudo useradd -m -U -u 2001 alice
• sudo useradd -m -U -u 2002 bob
• sudo useradd -m -U -u 2003 -G wheel charlotte; sudo passwd charlotte
: unshare -U
• unshare(1) -U user
• root
• 65534(nobody)
• sysctl kernel.overflowuid
(kernel.overflowgid)
• uid/gid
• nobdy
[alice@rutledge ~]$ id # alice
uid=2001(alice) gid=2001(alice)
groups=2001(alice) ...
[alice@rutledge ~]$ readlink /proc/$$/ns/user
user:[4026531837]
[alice@rutledge ~]$ unshare -U # sudo
[nobody@rutledge ~]$ id
uid=65534(nobody) gid=65534(nobody)
groups=65534(nobody) ...
[nobody@rutledge ~]$ readlink /proc/$$/ns/user
user:[4026532602]
[nobody@rutledge ~]$ sysctl kernel.overflowuid
kernel.overflowuid = 65534
[nobody@rutledge ~]$ ls -ld /home/* /root/
drwx------. 2 nobody nobody 99 Apr 15 18:36 /
home/alice
drwx------. 2 nobody nobody 62 Apr 15 18:11 /
home/bob
drwx------. 2 nobody nobody 83 Apr 15 18:32 /
home/charlotte
dr-xr-x---. 2 nobody nobody 114 Apr 12 18:55 /
root/
: nobody
•
• /home/alice
• /home/bob
• → nobody
• Alice
• user alice
• → Alice
• user alice
• nobody
[nobody@rutledge~]$ touch /home/alice/file
[nobody@rutledge ~]$ touch /home/bob/file
touch: cannot touch '/home/bob/file':
Permission denied
[nobody@rutledge ~]$ ls -l /home/alice/file
-rw-rw-r--. 1 nobody nobody 0 Apr 15 18:40 /
home/alice/file
[nobody@rutledge ~]$ ls -l /home/bob/
ls: cannot open directory '/home/bob/':
Permission denied
[nobody@rutledge ~]$ exit #
logout
[alice@rutledge ~]$ ls -l /home/alice/file
-rw-rw-r--. 1 alice alice 0 Apr 15 18:40 /home/
alice/file
: alice nobody
• /proc/${PID}/uid_map user
• ( uid) ( uid) ( )
•
• (5 )
•
•
•
• uid
• uid
[alice@rutledge ~]$ unshare -U
[nobody@rutledge ~]$ id
uid=65534(nobody) gid=65534(nobody)
groups=65534(nobody) ...
[nobody@rutledge ~]$ echo $$
2392
--- ---
[alice@rutledge ~]$ echo "0 2002 1" > /proc/2392/
uid_map
-bash: echo: write error: Operation not permitted
[alice@rutledge ~]$ echo "0 2001 2" > /proc/2392/
uid_map
-bash: echo: write error: Operation not permitted
[alice@rutledge ~]$ echo "0 2001 1" > /proc/2392/
uid_map
[alice@rutledge ~]$ echo "0 2001 1" > /proc/2392/
uid_map
-bash: echo: write error: Operation not permitted
--- ---
[nobody@rutledge ~]$ id
uid=0(root) gid=65534(nobody)
groups=65534(nobody) ...
: root
• uid=0 2001(alice)
• alice uid=0(root)
• /home/bob /root (
)alice
nobody( )
• unshare -r
• sudo root
[nobody@rutledge ~]$ id
uid=0(root) gid=65534(nobody)
groups=65534(nobody) ...
[nobody@rutledge ~]$ ls -ld /home/* /home/
drwxr-xr-x. 5 nobody nobody 47 Apr 15 18:21 /
home/
drwx------. 2 root nobody 111 Apr 15 18:40 /
home/alice
drwx------. 2 nobody nobody 62 Apr 15 18:11 /
home/bob
drwx------. 2 nobody nobody 83 Apr 15 18:32 /
home/charlotte
: root
• root
• /etc/shadow
• bob home
•
•
•
• poweroff
• root 🤔
[root@rutledge ~]# cat /etc/shadow
cat: /etc/shadow: Permission denied
[root@rutledge ~]# touch /home/bob/file
touch: cannot touch '/home/bob/file':
Permission denied
[root@rutledge ~]# pkill NetworkManager
pkill: killing pid 969 failed: Operation not
permitted
[root@rutledge ~]# ip link add type veth
RTNETLINK answers: Operation not permitted
[root@rutledge ~]# mount -t tmpfs tmpfs /bin/
mount: /usr/bin: permission denied.
[root@rutledge ~]# umount /boot
umount: /boot: must be superuser to unmount.
[root@rutledge ~]# poweroff
Failed to connect to bus: Operation not
permitted
Failed to open initctl fifo: Permission denied
Failed to talk to init daemon.
: root
• user alice
•
• user root
• chroot
• -U unshare
•
[root@rutledge ~]# chroot /
[root@rutledge /]# unshare --pid --fork --
mount-proc
[root@rutledge /]# ps -el --forest
F S UID PID PPID C PRI NI ADDR SZ
WCHAN TTY TIME CMD
4 S 0 1 0 0 80 0 - 7337 -
pts/1 00:00:00 bash
0 R 0 24 1 0 80 0 - 11184 -
pts/1 00:00:00 ps
:
• user user
• mnt mount
• net
• pid
• user
root
• ok (user
)
• user
[root@rutledge /]# unshare --mount --net --pid
--fork --mount-proc
[root@rutledge /]# mount -t tmpfs tmp /tmp/
[root@rutledge /]# findmnt /tmp
TARGET SOURCE FSTYPE OPTIONS
/tmp tmp tmpfs
rw,relatime,seclabel,uid=2001,gid=2001
[root@rutledge /]# ip link add type veth
[root@rutledge /]# ip a
1: lo: <LOOPBACK> ...
link/loopback 00:00:00:00:00:00 brd
00:00:00:00:00:00
2: veth0@veth1: <BROADCAST,MULTICAST,M-
DOWN> ...
link/ether 22:43:f8:f3:10:60 brd
ff:ff:ff:ff:ff:ff
3: veth1@veth0: <BROADCAST,MULTICAST,M-
DOWN> ...
link/ether e2:d0:8b:dd:19:b0 brd
ff:ff:ff:ff:ff:ff
:
•
chroot/pivot_root
1.
2. user + mount
3. pivot_root
bind
4. oldroot
5. pivot_root
6. oldroot exec chroot
7. oldroot lazy umount
•
--- yum charlotte
alice ---
[alice@rutledge ~]$ su - charlotte
[charlotte@rutledge ~]$ sudo yum install -y --
installroot=/home/alice/wonderland --releasever=8 @core
iproute
[charlotte@rutledge ~]$ sudo chown -R alice: /home/
alice/wonderland
--- alice---
[alice@rutledge ~]$ unshare -Ur -n -m -pf
[root@rutledge ~]# mkdir -p under_ground
[root@rutledge ~]# mount -o bind wonderland under_ground
[root@rutledge ~]# mkdir -p under_ground/.oldroot
[root@rutledge ~]# cd under_ground
[root@rutledge under_ground]# pivot_root . .oldroot
[root@rutledge under_ground]# exec chroot . /bin/bash -l
[root@rutledge /]# mount -t proc proc /proc
[root@rutledge /]# umount --lazy .oldroot
[root@rutledge /]# findmnt
TARGET SOURCE FSTYPE
OPTIONS
/ /dev/mapper/rhel-home[/alice/wonderland] xfs
rw,relatime,seclabel,attr2,inode64,noquota
└─/proc proc proc
rw,relatime
:
• ……
• su
• →uid_map 1
• net
• →net veth
NIC net
root
• bind overlayfs
• CoW
• → overlayfs (Kernel
)user
[root@rutledge /]# useradd jack
Setting mailbox file permissions: Invalid
argument
[root@rutledge /]# su - jack
su: cannot set groups: Operation not permitted
[root@rutledge /]# ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop ...
link/loopback 00:00:00:00:00:00 brd
00:00:00:00:00:00
[root@rutledge /]# mkdir -p upper work newroot
[root@rutledge /]# mount -t overlay -o
lowerdir=/,upperdir=upper,workdir=work overlay
newroot
mount: /mnt: permission denied.
podman
•
• podman (on RHEL8)
• podman yum dnf
• centos7 sleep inf
• Docker podman exec
• sudo (rootless!!)
[alice@rutledge ~]$ podman run -d
centos:centos7 sleep inf
1209...7e74
[alice@rutledge ~]$ podman exec -lit /bin/bash
[root@1209b4cedd82 /]# ps aux --forest
USER PID %CPU %MEM VSZ RSS TTY
STAT START TIME COMMAND
root 6 1.0 0.3 11832 2972 pts/0
Ss 10:23 0:00 /bin/bash
root 19 0.0 0.4 51748 3392 pts/0
R+ 10:23 0:00 _ ps aux --forest
root 1 0.0 0.0 4372 664 ?
Ss 10:22 0:00 sleep inf
1: podman uid_map
• podman
• uid_map 0 2001 1 

1 100000 65536 ……
• jack uid=1000
uid=1000999 ……
•
• root
uid_map uid
• 1000000
• 🤔
[alice@rutledge ~]$ podman exec -lit /bin/bash
[root@1209b4cedd82 /]# useradd jack
[root@1209b4cedd82 /]# su -c id jack
uid=1000(jack) gid=1000(jack) groups=1000(jack)
[root@1209b4cedd82 /]# cat /proc/1/uid_map
0 2001 1
1 100000 65536
newuidmap(1) / newgidmap(1)
• shadow-utils
• /proc/${pid}/uid_map(gid_map)
•
• SUID
=root
uid
• /etc/
subuid(subgid)
• useradd
•
• SUID rootless ……
[alice@rutledge ~]$ cat /etc/subuid
alice:100000:65536
bob:165536:65536
charlotte:231072:65536
[alice@rutledge ~]$ cat /etc/subgid
alice:100000:65536
bob:165536:65536
charlotte:231072:65536
[alice@rutledge ~]$ unshare -U sleep inf &
[1] 7126
[alice@rutledge ~]$ newuidmap $! 0 2002 1
newuidmap: uid range [0-1) -> [2002-2003) not
allowed
[alice@rutledge ~]$ newuidmap $! 0 $(id -u) 1 1
100000 65536
[alice@rutledge ~]$ newgidmap $! 0 $(id -g) 1 1
100000 65536
[alice@rutledge ~]$ cat /proc/$!/uid_map
0 2001 1
1 100000 65536
[alice@rutledge ~]$ cat /proc/$!/gid_map
0 2001 1
1 100000 65536
SUID rootless
• rootless
• root
• (200 )
• int overflow ……
• uid_map/gid_map
• e.g. user uid
• ( 1 1 newuidmap …… )
2: podman
• podman ( )
• tap0
• grep
slirp4netns
•
tap0
• → TUN/TAP
[alice@rutledge ~]$ podman exec -lit /bin/bash
[root@1209b4cedd82 /]# curl -I 'https://
retrieva.jp/'
HTTP/1.1 200 OK
:
[root@1209b4cedd82 /]# yum install -y iproute
[root@934bf6e4252b /]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc
noqueue ...
:
2: tap0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc
fq_codel ...
:
[root@934bf6e4252b /]# exit
[alice@rutledge ~]$ ps aux | grep tap0
alice 11881 0.0 0.2 4592 1856 pts/0
S 19:22 0:00 /usr/bin/slirp4netns -c -e 3 -
r 4 11870 tap0
[alice@rutledge ~]$ kill 11870
[alice@rutledge ~]$ podman exec -it $(podman ps
-ql) ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc
noqueue ...
:
slirp4netns: slirp
• slirp SLIP
(Serial Line Internet Protocol)
• SLIP PPP
•
net
slirp4netns
• QEMU
• IP
• default route: 10.0.2.2/24
• DNS forward: 10.0.2.3
• DHCP addresses: 10.0.2.15 - 10.0.2.31
[alice@rutledge ~]$ podman exec -lit /bin/bash
[root@934bf6e4252b /]# curl 'https://retrieva.jp/' -I
HTTP/1.1 200 OK
:
[root@a041f01d3221 /]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP>...
link/loopback 00:00:00:00:00:00 brd ...
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: tap0: <BROADCAST,UP,LOWER_UP>...
link/ether 0e:3c:3c:65:d9:82 brd ...
inet 10.0.2.100/24 brd 10.0.2.255 scope global
tap0
valid_lft forever preferred_lft forever
inet6 fe80::c3c:3cff:fe65:d982/64 scope link
valid_lft forever preferred_lft forever
[root@a041f01d3221 /]# ip route
default via 10.0.2.2 dev tap0
10.0.2.0/24 dev tap0 proto kernel scope link src
10.0.2.100
[root@934bf6e4252b /]# exit
slirp4netns: slirp netns
•
root
• net
• SUID
• RHEL8 slirp
listen
• slirp4netns-0.1-2 bind
[alice@rutledge ~]$ ls -l $(which slirp4netns)
-rwxr-xr-x. 1 root root 76264 8 11 2018 /
usr/bin/slirp4netns
[alice@rutledge ~]$ podman run -p 10080:80
centos:centos7
port bindings are not yet supported by rootless
containers
[alice@rutledge ~]$ rpm -q slirp4netns
slirp4netns-0.1-1.dev.gitc4e1bc5.el8+1463+3d8a3
dce.x86_64
3: CoW
• OS
• CoW(Copy-on-Write)
+
• Docker dm-thin overlayfs
root
• podman info
• GraphDriverName vfs
• GraphRoot ~/.local/ storage
• RunRoot /run/user/${UID}/run
• RunRoot bind
(hosts resolve.conf ) GraphRoot
• vfs-layers/mountpoints.json
[alice@rutledge ~]$ podman info
:
store:
ContainerStore:
number: 1
GraphDriverName: vfs
GraphOptions: []
GraphRoot: /home/alice/.local/share/
containers/storage
GraphStatus: {}
ImageStore:
number: 1
RunRoot: /run/user/2001/run
[alice@rutledge ~]$ find /run/user/2001/run
:
/run/user/2001/run/vfs-containers/d1ab...eefd
/run/user/2001/run/vfs-containers/d1ab...eefd/
userdata
:
/run/user/2001/run/vfs-layers
/run/user/2001/run/vfs-layers/mountpoints.json
podman (vfs)
• mountpoints.json
•
• jack
10999
(uid_map)
•
[alice@rutledge ~]$ jq '.[].path' /run/user/
2001/run/vfs-layers/mountpoints.json
"/home/alice/.local/share/containers/storage/
vfs/dir/aeaa...458a"
[alice@rutledge ~]$ ll /home/alice/.local/
share/containers/storage/vfs/dir/aeaa...458a/
total 16
-rw-r--r--. 1 alice alice 12082 Mar 6 02:36
anaconda-post.log
lrwxrwxrwx. 1 alice alice 7 Mar 6 02:34
bin -> usr/bin
drwxr-xr-x. 2 alice alice 6 Mar 6 02:34
dev
[alice@rutledge ~]$ ll /home/alice/.local/
share/containers/storage/vfs/dir/aeaa...7458a/
home/
total 0
drwx------. 2 100999 100999 62 Apr 15 21:19
jack
podman(vfs)
• centos:centos7
210M
• 10
210M*10=2G
• CoW
• → 2G
• CoW
[alice@rutledge ~]$ du -sh .local/share/
containers/storage/vfs/dir/aeaa...7458a/
210M .local/share/containers/storage/vfs/
dir/aeaa...458a/
[alice@rutledge ~]$ df -h .local/share/
containers/storage/
Filesystem Size Used Avail Use%
Mounted on
/dev/mapper/rhel-home 20G 4.2G 16G 21% /
home
[alice@rutledge ~]$ seq 10 | xargs -I{} podman
run -d centos:centos7 sleep inf
[alice@rutledge ~]$ df -h .local/share/
containers/storage/
Filesystem Size Used Avail Use%
Mounted on
/dev/mapper/rhel-home 20G 6.3G 14G 32% /
home
fuse-overlayfs(1)
• vfs
• fuse-overlayfs user
overlayfs
• ~/.config/containers/storage.conf
• storage.driver="overlay"
• storage_options.mount_program="/usr/
bin/fuse-overlayfs"
•
podman storage
• vfs XFS reflink
shallow copy/CoW
• orz
[alice@rutledge ~]$ podman rm -f --all
[alice@rutledge ~]$ podman rmi -f --all
[alice@rutledge ~]$ su -c 'rm /home/
alice/.local/' charlotte #
[alice@rutledge ~]$ mkdir -p .config/
containers/
[alice@rutledge ~]$ cat .config/containers/
storage.conf
[storage]
driver = "overlay"
[storage.options]
mount_program = "/usr/bin/fuse-overlayfs"
podman with fuse-overlayfs
• / fuse-overlayfs
• ~/.local/share/
containers/storage/*/ overlayfs
• diff: CoW
• work: overlayfs
• merged: overlayfs
•
• → mnt
[alice@rutledge ~]$ podman run -d centos:centos7
sleep inf
[alice@rutledge ~]$ podman exec -l findmnt /
TARGET SOURCE FSTYPE OPTIONS
/ fuse-overlayfs fuse.fuse-overlayfs
rw,nosuid,nodev,relatime,user_id=0,group_id=0,def
ault_permissions,allow_other
[alice@rutledge ~]$ ll /home/alice/.local/share/
containers/storage/overlay/*
/home/alice/.local/share/containers/storage/
overlay/
2bbb2f38cf08544b67e60954e9da373c67f2d5658a7e6a074
afc5818c9805ebe:
8
drwxr-xr-x. 4 alice alice 28 4 16 23:13 diff
-rw-r--r--. 1 alice alice 26 4 16 23:13 link
-rw-rw-r--. 1 alice alice 28 4 16 23:13 lower
drwx------. 2 alice alice 6 4 16 23:13 merged
drwx------. 3 alice alice 18 4 16 23:13 work
:
rootless
• su (uid_map )
• newuidmap(1) / newgidmap(1) (SUID )
• net (veth )
• slirp4netns !
• (bind )
• bind overlayfs (CoW )
• fuse-overlayfs nserns
• XFS reflink
: rootless
1.
2. user + mnt + net
3. [NEW] newuidmap(1) / newgidmap(1)
4. [UPDATE] pivot_root
bind fuse-overlayfs
5. oldroot
6. [NEW] fuse-overlayfs
pivot_root
mnt
•
• dev/ console tty bind
mount sys/ proc/
7. pivot_root
8. oldroot exec
chroot
9. oldroot lazy umount
10.[NEW] slirp4userns
11.[NEW] ip route
• Rootless
• https://www.slideshare.net/AkihiroSuda/rootless
• Namespaces in operation, part 1: namespaces overview [LWN.net]
• https://lwn.net/Articles/531114/
• Namespaces in operation, part 5: User namespaces [LWN.net]
• https://lwn.net/Articles/532593/
• Filesystem mounts in user namespaces [LWN.net]
• https://lwn.net/Articles/652468/
• Anatomy of a user namespaces vulnerability [LWN.net]
• https://lwn.net/Articles/543273/
• Man page of USER_NAMESPACES
• https://linuxjm.osdn.jp/html/LDP_man-pages/man7/
user_namespaces.7.html
• util-linux/unshare.c at master · karelzak/util-linux
• https://github.com/karelzak/util-linux/blob/master/sys-utils/
unshare.c
• shadow/newuidmap.c at master · shadow-maint/shadow
• https://github.com/shadow-maint/shadow/blob/master/src/
newuidmap.c
• hnakamur’s blog: QEMU Wiki Slirp Tap
• http://hnakamur.blogspot.com/2009/08/qemu-wikislirptap.html
• slirp4netns/main.c at master · rootless-containers/slirp4netns
• https://github.com/rootless-containers/slirp4netns/blob/master/
main.c
• Working with the Container Storage library and tools in Red Hat
Enterprise Linux
• https://www.redhat.com/en/blog/working-container-storage-
library-and-tools-red-hat-enterprise-linux
• The State of Rootless Containers
• https://www.slideshare.net/AkihiroSuda/the-state-of-rootless-
containers

Contenu connexe

Tendances

tcpdumpとtcpreplayとtcprewriteと他。
tcpdumpとtcpreplayとtcprewriteと他。tcpdumpとtcpreplayとtcprewriteと他。
tcpdumpとtcpreplayとtcprewriteと他。
(^-^) togakushi
 

Tendances (20)

データセンターネットワークでのPrometheus活用事例
データセンターネットワークでのPrometheus活用事例データセンターネットワークでのPrometheus活用事例
データセンターネットワークでのPrometheus活用事例
 
Docker Tokyo
Docker TokyoDocker Tokyo
Docker Tokyo
 
ネットワーク ゲームにおけるTCPとUDPの使い分け
ネットワーク ゲームにおけるTCPとUDPの使い分けネットワーク ゲームにおけるTCPとUDPの使い分け
ネットワーク ゲームにおけるTCPとUDPの使い分け
 
UnboundとNSDの紹介 BIND9との比較編
UnboundとNSDの紹介 BIND9との比較編UnboundとNSDの紹介 BIND9との比較編
UnboundとNSDの紹介 BIND9との比較編
 
GKE multi-cluster Ingress
GKE multi-cluster IngressGKE multi-cluster Ingress
GKE multi-cluster Ingress
 
本当にわかる Spectre と Meltdown
本当にわかる Spectre と Meltdown本当にわかる Spectre と Meltdown
本当にわかる Spectre と Meltdown
 
eStargzイメージとlazy pullingによる高速なコンテナ起動
eStargzイメージとlazy pullingによる高速なコンテナ起動eStargzイメージとlazy pullingによる高速なコンテナ起動
eStargzイメージとlazy pullingによる高速なコンテナ起動
 
Docker Compose 徹底解説
Docker Compose 徹底解説Docker Compose 徹底解説
Docker Compose 徹底解説
 
コンテナにおけるパフォーマンス調査でハマった話
コンテナにおけるパフォーマンス調査でハマった話コンテナにおけるパフォーマンス調査でハマった話
コンテナにおけるパフォーマンス調査でハマった話
 
プログラマ目線から見たRDMAのメリットと その応用例について
プログラマ目線から見たRDMAのメリットとその応用例についてプログラマ目線から見たRDMAのメリットとその応用例について
プログラマ目線から見たRDMAのメリットと その応用例について
 
VPP事始め
VPP事始めVPP事始め
VPP事始め
 
IIJmio meeting 31 音声通信の世界
IIJmio meeting 31 音声通信の世界IIJmio meeting 31 音声通信の世界
IIJmio meeting 31 音声通信の世界
 
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
 
tcpdumpとtcpreplayとtcprewriteと他。
tcpdumpとtcpreplayとtcprewriteと他。tcpdumpとtcpreplayとtcprewriteと他。
tcpdumpとtcpreplayとtcprewriteと他。
 
5分で分かるgitのrefspec
5分で分かるgitのrefspec5分で分かるgitのrefspec
5分で分かるgitのrefspec
 
イベント駆動プログラミングとI/O多重化
イベント駆動プログラミングとI/O多重化イベント駆動プログラミングとI/O多重化
イベント駆動プログラミングとI/O多重化
 
ヤフー社内でやってるMySQLチューニングセミナー大公開
ヤフー社内でやってるMySQLチューニングセミナー大公開ヤフー社内でやってるMySQLチューニングセミナー大公開
ヤフー社内でやってるMySQLチューニングセミナー大公開
 
Kubernete Meetup Tokyo #18 - Kubebuilder/controller-runtime 入門
Kubernete Meetup Tokyo #18 - Kubebuilder/controller-runtime 入門Kubernete Meetup Tokyo #18 - Kubebuilder/controller-runtime 入門
Kubernete Meetup Tokyo #18 - Kubebuilder/controller-runtime 入門
 
わかる!metadata.managedFields / Kubernetes Meetup Tokyo 48
わかる!metadata.managedFields / Kubernetes Meetup Tokyo 48わかる!metadata.managedFields / Kubernetes Meetup Tokyo 48
わかる!metadata.managedFields / Kubernetes Meetup Tokyo 48
 
Prometheus monitoring from outside of Kubernetes
 〜どうして我々はKubernetes上のPromet...
Prometheus monitoring from outside of Kubernetes
 〜どうして我々はKubernetes上のPromet...Prometheus monitoring from outside of Kubernetes
 〜どうして我々はKubernetes上のPromet...
Prometheus monitoring from outside of Kubernetes
 〜どうして我々はKubernetes上のPromet...
 

Similaire à コンテナ仮想、その裏側 〜user namespaceとrootlessコンテナ〜

Similaire à コンテナ仮想、その裏側 〜user namespaceとrootlessコンテナ〜 (20)

Linux 系統管理與安全:基本 Linux 系統知識
Linux 系統管理與安全:基本 Linux 系統知識Linux 系統管理與安全:基本 Linux 系統知識
Linux 系統管理與安全:基本 Linux 系統知識
 
Mac OS X Lion で作る WordPress local 環境
Mac OS X Lion で作る WordPress local 環境Mac OS X Lion で作る WordPress local 環境
Mac OS X Lion で作る WordPress local 環境
 
Linux basic3
Linux basic3Linux basic3
Linux basic3
 
Malcon2017
Malcon2017Malcon2017
Malcon2017
 
Linux 系統管理與安全:系統防駭與資訊安全
Linux 系統管理與安全:系統防駭與資訊安全Linux 系統管理與安全:系統防駭與資訊安全
Linux 系統管理與安全:系統防駭與資訊安全
 
Jackpot! Attacking Arcade Machines
Jackpot! Attacking Arcade MachinesJackpot! Attacking Arcade Machines
Jackpot! Attacking Arcade Machines
 
Configuring wifi in open embedded builds
Configuring wifi in open embedded buildsConfiguring wifi in open embedded builds
Configuring wifi in open embedded builds
 
Shell Scripting
Shell ScriptingShell Scripting
Shell Scripting
 
PFIセミナー資料 H27.10.22
PFIセミナー資料 H27.10.22PFIセミナー資料 H27.10.22
PFIセミナー資料 H27.10.22
 
Backups
BackupsBackups
Backups
 
Love The Terminal
Love The TerminalLove The Terminal
Love The Terminal
 
Sysdig
SysdigSysdig
Sysdig
 
How to Root 10 Million Phones with One Exploit
How to Root 10 Million Phones with One ExploitHow to Root 10 Million Phones with One Exploit
How to Root 10 Million Phones with One Exploit
 
Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010
 
GIT, RVM, FIRST HEROKU APP
GIT, RVM, FIRST HEROKU APPGIT, RVM, FIRST HEROKU APP
GIT, RVM, FIRST HEROKU APP
 
Phishing for Root (How I Got Access to Root on Your Computer With 8 Seconds o...
Phishing for Root (How I Got Access to Root on Your Computer With 8 Seconds o...Phishing for Root (How I Got Access to Root on Your Computer With 8 Seconds o...
Phishing for Root (How I Got Access to Root on Your Computer With 8 Seconds o...
 
Efficient DBA: Gain Time by Reducing Command-Line Keystrokes
Efficient DBA: Gain Time by Reducing Command-Line KeystrokesEfficient DBA: Gain Time by Reducing Command-Line Keystrokes
Efficient DBA: Gain Time by Reducing Command-Line Keystrokes
 
Miscelaneous Debris
Miscelaneous DebrisMiscelaneous Debris
Miscelaneous Debris
 
What is suid, sgid and sticky bit
What is suid, sgid and sticky bit  What is suid, sgid and sticky bit
What is suid, sgid and sticky bit
 
파이썬 개발환경 구성하기의 끝판왕 - Docker Compose
파이썬 개발환경 구성하기의 끝판왕 - Docker Compose파이썬 개발환경 구성하기의 끝판왕 - Docker Compose
파이썬 개발환경 구성하기의 끝판왕 - Docker Compose
 

Plus de Retrieva inc.

Plus de Retrieva inc. (18)

音声認識入門(前編)
音声認識入門(前編)音声認識入門(前編)
音声認識入門(前編)
 
自然言語処理シリーズ9 構文解析 3.6-3.8節
自然言語処理シリーズ9 構文解析 3.6-3.8節自然言語処理シリーズ9 構文解析 3.6-3.8節
自然言語処理シリーズ9 構文解析 3.6-3.8節
 
Linuxカーネルを読んで改めて知るプロセスとスレッドの違い
Linuxカーネルを読んで改めて知るプロセスとスレッドの違いLinuxカーネルを読んで改めて知るプロセスとスレッドの違い
Linuxカーネルを読んで改めて知るプロセスとスレッドの違い
 
IP電話交換機ソフト Asterisk について
IP電話交換機ソフト Asterisk についてIP電話交換機ソフト Asterisk について
IP電話交換機ソフト Asterisk について
 
論理的思考で読解力を培う
論理的思考で読解力を培う論理的思考で読解力を培う
論理的思考で読解力を培う
 
キートップのノベルティを作ってみた話
キートップのノベルティを作ってみた話キートップのノベルティを作ってみた話
キートップのノベルティを作ってみた話
 
自然言語処理勉強会11章 情報抽出と知識獲得
自然言語処理勉強会11章 情報抽出と知識獲得自然言語処理勉強会11章 情報抽出と知識獲得
自然言語処理勉強会11章 情報抽出と知識獲得
 
キートップのノベルティを作った話
キートップのノベルティを作った話キートップのノベルティを作った話
キートップのノベルティを作った話
 
放送大学テキスト「自然言語処理」 6章 意味の解析(1)
放送大学テキスト「自然言語処理」 6章 意味の解析(1)放送大学テキスト「自然言語処理」 6章 意味の解析(1)
放送大学テキスト「自然言語処理」 6章 意味の解析(1)
 
キーボード自作のススメ
キーボード自作のススメキーボード自作のススメ
キーボード自作のススメ
 
レトリバのキートップ説明書
レトリバのキートップ説明書レトリバのキートップ説明書
レトリバのキートップ説明書
 
ブースティング入門
ブースティング入門ブースティング入門
ブースティング入門
 
情報検索の基礎
情報検索の基礎情報検索の基礎
情報検索の基礎
 
Chainer の Trainer 解説と NStepLSTM について
Chainer の Trainer 解説と NStepLSTM についてChainer の Trainer 解説と NStepLSTM について
Chainer の Trainer 解説と NStepLSTM について
 
ChainerでDeep Learningを試すために必要なこと
ChainerでDeep Learningを試すために必要なことChainerでDeep Learningを試すために必要なこと
ChainerでDeep Learningを試すために必要なこと
 
20170221 cnet live講演資料
20170221 cnet live講演資料20170221 cnet live講演資料
20170221 cnet live講演資料
 
Making Google Cardboard and Laser Cutter
Making Google Cardboard and Laser CutterMaking Google Cardboard and Laser Cutter
Making Google Cardboard and Laser Cutter
 
Chainerで学ぶdeep learning
Chainerで学ぶdeep learningChainerで学ぶdeep learning
Chainerで学ぶdeep learning
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

コンテナ仮想、その裏側 〜user namespaceとrootlessコンテナ〜

  • 2. • • • • • • user • user • podman • podman uid_man • podman • podman • rootless •
  • 3. : rootless • root root • Docker docker group 
 docker group ≒ root rootless rootfull • rootless • • e.g. CVE-2014-9357: (Docker) • root
  • 4. rootless : podman • RHEL8 • Docker • Podman docker • root daemon Docker RedHat • root • docker
  • 5. • RHEL8 podman rootless • • • Retrieva Tech Blog • [🔍 TECH Blog] •
  • 6. • • Linux Namespace cgroups (+ CoW secomp etc……) • Linux Namespace pid ( )OS ID ( ) • • root
  • 7. : • • /proc/${PID}/ns/ • fork • clone(2) unshare(2) • setns(2) • /proc/${PID}/ns/ fd
  • 8. : • mnt : (2.4.19 ) • ipc : (2.6.19 ) • uts : (2.6.19 ) • net : (2.6.24 ) • pid : ID (2.6.24 ) • user : uid/gid capability (2.6.23 ) • 3.8
  • 9. : mnt • • /tmp • pivot_root • /proc • clone(2) CLONE_NEW* (2.4.19) CLONE_NEWNS
  • 10. : ipc • (InterProcess Communication) • • PIPE IPC • /proc/sys/fs/mqueue
  • 12. : net • • • net • • veth( ) • ip(1) • /proc/${PID}/ns/ bind
  • 13. : pid • id • pid pid • pid • /proc mnt /proc • ps(1) /proc pid
  • 14. user new!! • • uid • → uid=0 (root) • Linux 3.8 User Namespace • clone(2) CLONE_NEWUSER 2.6.23 clone(2) 3.5 3.8 • RHEL RHEL7.3(Kernel 3.10.0) User Namespace • RHEL7.4 sysctl RHEL8
  • 15. user • • • uid=0 ( ) • e.g. (uid=0) / / SUID / CLONE_FS chroot so / mount propagation / audit log( ) etc • RHEL Fedora Project
  • 16. User Namespace • • root • etc • = • User Namespace
  • 17. : • RHEL7/Centos7 (7.4 ) (RHEL8 / Ubuntu ) • sudo sysctl user.max_user_namespaces=31194 • user 7 0 • • sudo useradd -m -U -u 2001 alice • sudo useradd -m -U -u 2002 bob • sudo useradd -m -U -u 2003 -G wheel charlotte; sudo passwd charlotte
  • 18. : unshare -U • unshare(1) -U user • root • 65534(nobody) • sysctl kernel.overflowuid (kernel.overflowgid) • uid/gid • nobdy [alice@rutledge ~]$ id # alice uid=2001(alice) gid=2001(alice) groups=2001(alice) ... [alice@rutledge ~]$ readlink /proc/$$/ns/user user:[4026531837] [alice@rutledge ~]$ unshare -U # sudo [nobody@rutledge ~]$ id uid=65534(nobody) gid=65534(nobody) groups=65534(nobody) ... [nobody@rutledge ~]$ readlink /proc/$$/ns/user user:[4026532602] [nobody@rutledge ~]$ sysctl kernel.overflowuid kernel.overflowuid = 65534 [nobody@rutledge ~]$ ls -ld /home/* /root/ drwx------. 2 nobody nobody 99 Apr 15 18:36 / home/alice drwx------. 2 nobody nobody 62 Apr 15 18:11 / home/bob drwx------. 2 nobody nobody 83 Apr 15 18:32 / home/charlotte dr-xr-x---. 2 nobody nobody 114 Apr 12 18:55 / root/
  • 19. : nobody • • /home/alice • /home/bob • → nobody • Alice • user alice • → Alice • user alice • nobody [nobody@rutledge~]$ touch /home/alice/file [nobody@rutledge ~]$ touch /home/bob/file touch: cannot touch '/home/bob/file': Permission denied [nobody@rutledge ~]$ ls -l /home/alice/file -rw-rw-r--. 1 nobody nobody 0 Apr 15 18:40 / home/alice/file [nobody@rutledge ~]$ ls -l /home/bob/ ls: cannot open directory '/home/bob/': Permission denied [nobody@rutledge ~]$ exit # logout [alice@rutledge ~]$ ls -l /home/alice/file -rw-rw-r--. 1 alice alice 0 Apr 15 18:40 /home/ alice/file
  • 20. : alice nobody • /proc/${PID}/uid_map user • ( uid) ( uid) ( ) • • (5 ) • • • • uid • uid [alice@rutledge ~]$ unshare -U [nobody@rutledge ~]$ id uid=65534(nobody) gid=65534(nobody) groups=65534(nobody) ... [nobody@rutledge ~]$ echo $$ 2392 --- --- [alice@rutledge ~]$ echo "0 2002 1" > /proc/2392/ uid_map -bash: echo: write error: Operation not permitted [alice@rutledge ~]$ echo "0 2001 2" > /proc/2392/ uid_map -bash: echo: write error: Operation not permitted [alice@rutledge ~]$ echo "0 2001 1" > /proc/2392/ uid_map [alice@rutledge ~]$ echo "0 2001 1" > /proc/2392/ uid_map -bash: echo: write error: Operation not permitted --- --- [nobody@rutledge ~]$ id uid=0(root) gid=65534(nobody) groups=65534(nobody) ...
  • 21. : root • uid=0 2001(alice) • alice uid=0(root) • /home/bob /root ( )alice nobody( ) • unshare -r • sudo root [nobody@rutledge ~]$ id uid=0(root) gid=65534(nobody) groups=65534(nobody) ... [nobody@rutledge ~]$ ls -ld /home/* /home/ drwxr-xr-x. 5 nobody nobody 47 Apr 15 18:21 / home/ drwx------. 2 root nobody 111 Apr 15 18:40 / home/alice drwx------. 2 nobody nobody 62 Apr 15 18:11 / home/bob drwx------. 2 nobody nobody 83 Apr 15 18:32 / home/charlotte
  • 22. : root • root • /etc/shadow • bob home • • • • poweroff • root 🤔 [root@rutledge ~]# cat /etc/shadow cat: /etc/shadow: Permission denied [root@rutledge ~]# touch /home/bob/file touch: cannot touch '/home/bob/file': Permission denied [root@rutledge ~]# pkill NetworkManager pkill: killing pid 969 failed: Operation not permitted [root@rutledge ~]# ip link add type veth RTNETLINK answers: Operation not permitted [root@rutledge ~]# mount -t tmpfs tmpfs /bin/ mount: /usr/bin: permission denied. [root@rutledge ~]# umount /boot umount: /boot: must be superuser to unmount. [root@rutledge ~]# poweroff Failed to connect to bus: Operation not permitted Failed to open initctl fifo: Permission denied Failed to talk to init daemon.
  • 23. : root • user alice • • user root • chroot • -U unshare • [root@rutledge ~]# chroot / [root@rutledge /]# unshare --pid --fork -- mount-proc [root@rutledge /]# ps -el --forest F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 4 S 0 1 0 0 80 0 - 7337 - pts/1 00:00:00 bash 0 R 0 24 1 0 80 0 - 11184 - pts/1 00:00:00 ps
  • 24. : • user user • mnt mount • net • pid • user root • ok (user ) • user [root@rutledge /]# unshare --mount --net --pid --fork --mount-proc [root@rutledge /]# mount -t tmpfs tmp /tmp/ [root@rutledge /]# findmnt /tmp TARGET SOURCE FSTYPE OPTIONS /tmp tmp tmpfs rw,relatime,seclabel,uid=2001,gid=2001 [root@rutledge /]# ip link add type veth [root@rutledge /]# ip a 1: lo: <LOOPBACK> ... link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: veth0@veth1: <BROADCAST,MULTICAST,M- DOWN> ... link/ether 22:43:f8:f3:10:60 brd ff:ff:ff:ff:ff:ff 3: veth1@veth0: <BROADCAST,MULTICAST,M- DOWN> ... link/ether e2:d0:8b:dd:19:b0 brd ff:ff:ff:ff:ff:ff
  • 25. : • chroot/pivot_root 1. 2. user + mount 3. pivot_root bind 4. oldroot 5. pivot_root 6. oldroot exec chroot 7. oldroot lazy umount • --- yum charlotte alice --- [alice@rutledge ~]$ su - charlotte [charlotte@rutledge ~]$ sudo yum install -y -- installroot=/home/alice/wonderland --releasever=8 @core iproute [charlotte@rutledge ~]$ sudo chown -R alice: /home/ alice/wonderland --- alice--- [alice@rutledge ~]$ unshare -Ur -n -m -pf [root@rutledge ~]# mkdir -p under_ground [root@rutledge ~]# mount -o bind wonderland under_ground [root@rutledge ~]# mkdir -p under_ground/.oldroot [root@rutledge ~]# cd under_ground [root@rutledge under_ground]# pivot_root . .oldroot [root@rutledge under_ground]# exec chroot . /bin/bash -l [root@rutledge /]# mount -t proc proc /proc [root@rutledge /]# umount --lazy .oldroot [root@rutledge /]# findmnt TARGET SOURCE FSTYPE OPTIONS / /dev/mapper/rhel-home[/alice/wonderland] xfs rw,relatime,seclabel,attr2,inode64,noquota └─/proc proc proc rw,relatime
  • 26. : • …… • su • →uid_map 1 • net • →net veth NIC net root • bind overlayfs • CoW • → overlayfs (Kernel )user [root@rutledge /]# useradd jack Setting mailbox file permissions: Invalid argument [root@rutledge /]# su - jack su: cannot set groups: Operation not permitted [root@rutledge /]# ip a 1: lo: <LOOPBACK> mtu 65536 qdisc noop ... link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 [root@rutledge /]# mkdir -p upper work newroot [root@rutledge /]# mount -t overlay -o lowerdir=/,upperdir=upper,workdir=work overlay newroot mount: /mnt: permission denied.
  • 27. podman • • podman (on RHEL8) • podman yum dnf • centos7 sleep inf • Docker podman exec • sudo (rootless!!) [alice@rutledge ~]$ podman run -d centos:centos7 sleep inf 1209...7e74 [alice@rutledge ~]$ podman exec -lit /bin/bash [root@1209b4cedd82 /]# ps aux --forest USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 6 1.0 0.3 11832 2972 pts/0 Ss 10:23 0:00 /bin/bash root 19 0.0 0.4 51748 3392 pts/0 R+ 10:23 0:00 _ ps aux --forest root 1 0.0 0.0 4372 664 ? Ss 10:22 0:00 sleep inf
  • 28. 1: podman uid_map • podman • uid_map 0 2001 1 
 1 100000 65536 …… • jack uid=1000 uid=1000999 …… • • root uid_map uid • 1000000 • 🤔 [alice@rutledge ~]$ podman exec -lit /bin/bash [root@1209b4cedd82 /]# useradd jack [root@1209b4cedd82 /]# su -c id jack uid=1000(jack) gid=1000(jack) groups=1000(jack) [root@1209b4cedd82 /]# cat /proc/1/uid_map 0 2001 1 1 100000 65536
  • 29. newuidmap(1) / newgidmap(1) • shadow-utils • /proc/${pid}/uid_map(gid_map) • • SUID =root uid • /etc/ subuid(subgid) • useradd • • SUID rootless …… [alice@rutledge ~]$ cat /etc/subuid alice:100000:65536 bob:165536:65536 charlotte:231072:65536 [alice@rutledge ~]$ cat /etc/subgid alice:100000:65536 bob:165536:65536 charlotte:231072:65536 [alice@rutledge ~]$ unshare -U sleep inf & [1] 7126 [alice@rutledge ~]$ newuidmap $! 0 2002 1 newuidmap: uid range [0-1) -> [2002-2003) not allowed [alice@rutledge ~]$ newuidmap $! 0 $(id -u) 1 1 100000 65536 [alice@rutledge ~]$ newgidmap $! 0 $(id -g) 1 1 100000 65536 [alice@rutledge ~]$ cat /proc/$!/uid_map 0 2001 1 1 100000 65536 [alice@rutledge ~]$ cat /proc/$!/gid_map 0 2001 1 1 100000 65536
  • 30. SUID rootless • rootless • root • (200 ) • int overflow …… • uid_map/gid_map • e.g. user uid • ( 1 1 newuidmap …… )
  • 31. 2: podman • podman ( ) • tap0 • grep slirp4netns • tap0 • → TUN/TAP [alice@rutledge ~]$ podman exec -lit /bin/bash [root@1209b4cedd82 /]# curl -I 'https:// retrieva.jp/' HTTP/1.1 200 OK : [root@1209b4cedd82 /]# yum install -y iproute [root@934bf6e4252b /]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue ... : 2: tap0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel ... : [root@934bf6e4252b /]# exit [alice@rutledge ~]$ ps aux | grep tap0 alice 11881 0.0 0.2 4592 1856 pts/0 S 19:22 0:00 /usr/bin/slirp4netns -c -e 3 - r 4 11870 tap0 [alice@rutledge ~]$ kill 11870 [alice@rutledge ~]$ podman exec -it $(podman ps -ql) ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue ... :
  • 32. slirp4netns: slirp • slirp SLIP (Serial Line Internet Protocol) • SLIP PPP • net slirp4netns • QEMU • IP • default route: 10.0.2.2/24 • DNS forward: 10.0.2.3 • DHCP addresses: 10.0.2.15 - 10.0.2.31 [alice@rutledge ~]$ podman exec -lit /bin/bash [root@934bf6e4252b /]# curl 'https://retrieva.jp/' -I HTTP/1.1 200 OK : [root@a041f01d3221 /]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP>... link/loopback 00:00:00:00:00:00 brd ... inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: tap0: <BROADCAST,UP,LOWER_UP>... link/ether 0e:3c:3c:65:d9:82 brd ... inet 10.0.2.100/24 brd 10.0.2.255 scope global tap0 valid_lft forever preferred_lft forever inet6 fe80::c3c:3cff:fe65:d982/64 scope link valid_lft forever preferred_lft forever [root@a041f01d3221 /]# ip route default via 10.0.2.2 dev tap0 10.0.2.0/24 dev tap0 proto kernel scope link src 10.0.2.100 [root@934bf6e4252b /]# exit
  • 33. slirp4netns: slirp netns • root • net • SUID • RHEL8 slirp listen • slirp4netns-0.1-2 bind [alice@rutledge ~]$ ls -l $(which slirp4netns) -rwxr-xr-x. 1 root root 76264 8 11 2018 / usr/bin/slirp4netns [alice@rutledge ~]$ podman run -p 10080:80 centos:centos7 port bindings are not yet supported by rootless containers [alice@rutledge ~]$ rpm -q slirp4netns slirp4netns-0.1-1.dev.gitc4e1bc5.el8+1463+3d8a3 dce.x86_64
  • 34. 3: CoW • OS • CoW(Copy-on-Write) + • Docker dm-thin overlayfs root • podman info • GraphDriverName vfs • GraphRoot ~/.local/ storage • RunRoot /run/user/${UID}/run • RunRoot bind (hosts resolve.conf ) GraphRoot • vfs-layers/mountpoints.json [alice@rutledge ~]$ podman info : store: ContainerStore: number: 1 GraphDriverName: vfs GraphOptions: [] GraphRoot: /home/alice/.local/share/ containers/storage GraphStatus: {} ImageStore: number: 1 RunRoot: /run/user/2001/run [alice@rutledge ~]$ find /run/user/2001/run : /run/user/2001/run/vfs-containers/d1ab...eefd /run/user/2001/run/vfs-containers/d1ab...eefd/ userdata : /run/user/2001/run/vfs-layers /run/user/2001/run/vfs-layers/mountpoints.json
  • 35. podman (vfs) • mountpoints.json • • jack 10999 (uid_map) • [alice@rutledge ~]$ jq '.[].path' /run/user/ 2001/run/vfs-layers/mountpoints.json "/home/alice/.local/share/containers/storage/ vfs/dir/aeaa...458a" [alice@rutledge ~]$ ll /home/alice/.local/ share/containers/storage/vfs/dir/aeaa...458a/ total 16 -rw-r--r--. 1 alice alice 12082 Mar 6 02:36 anaconda-post.log lrwxrwxrwx. 1 alice alice 7 Mar 6 02:34 bin -> usr/bin drwxr-xr-x. 2 alice alice 6 Mar 6 02:34 dev [alice@rutledge ~]$ ll /home/alice/.local/ share/containers/storage/vfs/dir/aeaa...7458a/ home/ total 0 drwx------. 2 100999 100999 62 Apr 15 21:19 jack
  • 36. podman(vfs) • centos:centos7 210M • 10 210M*10=2G • CoW • → 2G • CoW [alice@rutledge ~]$ du -sh .local/share/ containers/storage/vfs/dir/aeaa...7458a/ 210M .local/share/containers/storage/vfs/ dir/aeaa...458a/ [alice@rutledge ~]$ df -h .local/share/ containers/storage/ Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhel-home 20G 4.2G 16G 21% / home [alice@rutledge ~]$ seq 10 | xargs -I{} podman run -d centos:centos7 sleep inf [alice@rutledge ~]$ df -h .local/share/ containers/storage/ Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhel-home 20G 6.3G 14G 32% / home
  • 37. fuse-overlayfs(1) • vfs • fuse-overlayfs user overlayfs • ~/.config/containers/storage.conf • storage.driver="overlay" • storage_options.mount_program="/usr/ bin/fuse-overlayfs" • podman storage • vfs XFS reflink shallow copy/CoW • orz [alice@rutledge ~]$ podman rm -f --all [alice@rutledge ~]$ podman rmi -f --all [alice@rutledge ~]$ su -c 'rm /home/ alice/.local/' charlotte # [alice@rutledge ~]$ mkdir -p .config/ containers/ [alice@rutledge ~]$ cat .config/containers/ storage.conf [storage] driver = "overlay" [storage.options] mount_program = "/usr/bin/fuse-overlayfs"
  • 38. podman with fuse-overlayfs • / fuse-overlayfs • ~/.local/share/ containers/storage/*/ overlayfs • diff: CoW • work: overlayfs • merged: overlayfs • • → mnt [alice@rutledge ~]$ podman run -d centos:centos7 sleep inf [alice@rutledge ~]$ podman exec -l findmnt / TARGET SOURCE FSTYPE OPTIONS / fuse-overlayfs fuse.fuse-overlayfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,def ault_permissions,allow_other [alice@rutledge ~]$ ll /home/alice/.local/share/ containers/storage/overlay/* /home/alice/.local/share/containers/storage/ overlay/ 2bbb2f38cf08544b67e60954e9da373c67f2d5658a7e6a074 afc5818c9805ebe: 8 drwxr-xr-x. 4 alice alice 28 4 16 23:13 diff -rw-r--r--. 1 alice alice 26 4 16 23:13 link -rw-rw-r--. 1 alice alice 28 4 16 23:13 lower drwx------. 2 alice alice 6 4 16 23:13 merged drwx------. 3 alice alice 18 4 16 23:13 work :
  • 39. rootless • su (uid_map ) • newuidmap(1) / newgidmap(1) (SUID ) • net (veth ) • slirp4netns ! • (bind ) • bind overlayfs (CoW ) • fuse-overlayfs nserns • XFS reflink
  • 40. : rootless 1. 2. user + mnt + net 3. [NEW] newuidmap(1) / newgidmap(1) 4. [UPDATE] pivot_root bind fuse-overlayfs 5. oldroot 6. [NEW] fuse-overlayfs pivot_root mnt • • dev/ console tty bind mount sys/ proc/ 7. pivot_root 8. oldroot exec chroot 9. oldroot lazy umount 10.[NEW] slirp4userns 11.[NEW] ip route
  • 41. • Rootless • https://www.slideshare.net/AkihiroSuda/rootless • Namespaces in operation, part 1: namespaces overview [LWN.net] • https://lwn.net/Articles/531114/ • Namespaces in operation, part 5: User namespaces [LWN.net] • https://lwn.net/Articles/532593/ • Filesystem mounts in user namespaces [LWN.net] • https://lwn.net/Articles/652468/ • Anatomy of a user namespaces vulnerability [LWN.net] • https://lwn.net/Articles/543273/ • Man page of USER_NAMESPACES • https://linuxjm.osdn.jp/html/LDP_man-pages/man7/ user_namespaces.7.html • util-linux/unshare.c at master · karelzak/util-linux • https://github.com/karelzak/util-linux/blob/master/sys-utils/ unshare.c • shadow/newuidmap.c at master · shadow-maint/shadow • https://github.com/shadow-maint/shadow/blob/master/src/ newuidmap.c • hnakamur’s blog: QEMU Wiki Slirp Tap • http://hnakamur.blogspot.com/2009/08/qemu-wikislirptap.html • slirp4netns/main.c at master · rootless-containers/slirp4netns • https://github.com/rootless-containers/slirp4netns/blob/master/ main.c • Working with the Container Storage library and tools in Red Hat Enterprise Linux • https://www.redhat.com/en/blog/working-container-storage- library-and-tools-red-hat-enterprise-linux • The State of Rootless Containers • https://www.slideshare.net/AkihiroSuda/the-state-of-rootless- containers