Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Docker Container: isolation and security

1 459 vues

Publié le

- Isolation - Linux Namespaces
- Isolation - Control Groups
- Container Security

Publié dans : Technologie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Docker Container: isolation and security

  1. 1. Docker Container: Isolation and Security Eric Fu 1
  2. 2. chroot In UNIX, everything is a file. 2
  3. 3. Overview Isolation ‑ Linux Namespaces Isolation ‑ Control Groups Container Security 3
  4. 4. Isolation ‑ Linux Namespaces Process‑level Isolation 4
  5. 5. Linux Namespaces Category Clone Flag Kernel version Mount namespaces CLONE_NEWNS Linux 2.4.19 UTS namespaces CLONE_NEWUTS Linux 2.6.19 IPC namespaces CLONE_NEWIPC Linux 2.6.19 PID namespaces CLONE_NEWPID Linux 2.6.24 Network namespaces CLONE_NEWNET Linux 2.6.24, completed in 2.6.29 User namespaces CLONE_NEWUSER Linux 2.6.23, completed in 3.8 5
  6. 6. clone() static char container_stack[STACK_SIZE]; char* const container_args[] = {"/bin/bash", NULL}; int container_main(void* arg) { // Open a shell execv(container_args[0], container_args); // Should never be here } int main() { int container_pid = clone(container_main, container_stack+STACK_SIZE, SIGCHLD, NULL); waitpid(container_pid, NULL, 0); return 0; } 6
  7. 7. UTS Namespace ( CLONE_NEWUTS ) Isolates system identifiers:  nodename and  domainname . int container_main(void* arg) { sethostname("container", 10); // Open a shell execv(container_args[0], container_args); // Should never be here } 7
  8. 8. IPC Namespace ( CLONE_NEWIPC ) Isolates IPC resources: SystemV IPC objects and POSIX message queues. root@eric-vm:/home/eric/linux_namespace# ipcmk -Q Message queue id: 0 root@eric-vm:/home/eric/linux_namespace# ipcs -q ------ Message Queues -------- key msqid owner perms used-bytes messages 0xd5467105 0 root 644 0 0 root@eric-vm:/home/eric/linux_namespace# ./test_ipc_ns Parent - start a container! Container - inside the container! root@container:/home/eric/linux_namespace# ipcs -q ------ Message Queues -------- key msqid owner perms used-bytes messages 8
  9. 9. PID Namespace ( CLONE_NEWPID ) Isolate the PID space. Processes in different PID namespaces can have the same PID. eric@eric-vm:~/linux_namespace$ sudo ./test_pid_ns Parent (2536) - start a container! Container (1) - inside the container! Why  ps aux still show all processes? 9
  10. 10. Mount Namespace ( CLONE_NEWNS ) Isolate the set of filesystem mount points seen by a group of processes. Processes in different mount namespaces can have different views of the filesystem hierarchy. mount("proc", "/proc", "proc", 0, NULL); Inside the container: / # ps aux PID USER TIME COMMAND 1 root 0:00 /bin/sh 3 root 0:00 ps aux 10
  11. 11. Mount a Real Docker Image docker save alpine | undocker -i -o rootfs alpine // System mount points mount("proc", "rootfs/proc", "proc", 0, NULL); mount("sysfs", "rootfs/sys", "sysfs", 0, NULL); mount("none", "rootfs/tmp", "tmpfs", 0, NULL); mount("udev", "rootfs/dev", "devtmpfs", 0, NULL); // Config files mount("conf/hosts", "rootfs/etc/hosts", "none", MS_BIND, NULL); mount("conf/hostname", "rootfs/etc/hostname", "none", MS_BIND, NULL); mount("conf/resolv.conf", "rootfs/etc/resolv.conf", "none", MS_BIND, NULL); // Chroot chdir("./rootfs"); chroot("./"); 11
  12. 12. User namespace ( CLONE_NEWUSER ) Isolates the user and group ID spaces. A process's UID and GID can be different inside and outside a user namespace. void set_map(char* file, int inside_id, int outside_id, int len) { FILE *fd = fopen(file, "w"); fprintf(fd, "%d %d %d", inside_id, outside_id, len); fclose(fd); } void set_uid_map(pid_t pid, int inside_id, int outside_id, int len) { char file[256]; sprintf(file, "/proc/%d/uid_map", pid); set_map(file, inside_id, outside_id, len); } void set_gid_map(pid_t pid, int inside_id, int outside_id, int len) { char file[256]; sprintf(file, "/proc/%d/gid_map", pid); set_map(file, inside_id, outside_id, len); } 12
  13. 13. Network namespace ( CLONE_NEWNET ) Preparation brctl addbr br0 ifconfig br0 192.168.10.1/24 up Host ip link add veth0 type veth peer name veth1 ip link set veth1 netns $PID brctl addif br0 veth0 ip link set veth0 up Container ip link set dev veth1 name eth0 ip link set eth0 up ip link set lo up ip addr add 192.168.10.2/24 dev eth0 ip route add default via 192.168.10.1 13
  14. 14. Network Topology 14
  15. 15. Isolation ‑ Control Groups Resource Limiting 15
  16. 16. Linux Control Groups blkio (Disk I/O) cpu (CPU quota) cpuset (CPU cores) devices memory net_cls (Network package class id) net_prio (Network package priority) hugetlb (HugeTLB) cpuacct freezer 16
  17. 17. Glance root@eric-vm:/sys/fs/cgroup# ls blkio cpuacct cpuset freezer memory net_cls,net_prio perf_event systemd cpu cpu,cpuacct devices hugetlb net_cls net_prio pids root@eric-vm:/sys/fs/cgroup/cpu$ sudo mkdir test root@eric-vm:/sys/fs/cgroup/cpu/test$ ls cgroup.clone_children cpuacct.stat cpuacct.usage_percpu cpu.cfs_quota_us cpu.stat cgroup.procs cpuacct.usage cpu.cfs_period_us cpu.shares notify_o 17
  18. 18. We have a CPU killer int main() { int i = 0; for (;;) i++; return 0; }  top  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3985 eric 20 0 4224 648 576 R 99.9 0.1 0:15.53 deadloop 18
  19. 19. Usage Create a group. (Yes, just  mkdir ) sudo mkdir /sys/fs/cgroup/cpu/test Set a limit. 20000 means 20% CPU time. echo 20000 > /sys/fs/cgroup/cpu,cpuacct/test Add a process to our group. echo 3985 >> /sys/fs/cgroup/cpu,cpuacct/test/tasks 19
  20. 20. Container Security 20
  21. 21. "Container" Linux kernel namespaces provide the isolation (hence “container”) in which we place one or more processes Linux kernel cgroups (“Control groups”) provide resource limiting and accounting (CPU, memory, I/O bandwidth, etc.) 21
  22. 22. Container Properties A shared kernel across all containers on a single host. Unique filesystem, a layered model using CoW (copy‑on‑write) union filesystems. Linux namespaces are shareable (Kubernetes “pod”) One process per container 22
  23. 23. Linux Capabilities Add/Drop unnecessary capabilities from a container. $ docker run --rm -ti busybox sh / # hostname foo hostname: sethostname: Operation not permitted $ docker run --rm -ti --cap-add=SYS_ADMIN busybox sh / # hostname foo <hostname changed> $ docker run --rm -ti --cap-drop=NET_RAW busybox sh / # ping 8.8.8.8 ping: permission denied (are you root?) 23
  24. 24. Linux Capabilities 24
  25. 25. Seccomp Block specific syscalls from being used by container binaries. $ cat policy.json { "defaultAction": "SCMP_ACT_ALLOW", "syscalls": [ { "name": "chmod", "action": "SCMP_ACT_ERRNO" } ] } $ docker run --rm -it --security-opt seccomp:policy.json busybox chmod 640 /etc/resolv.conf chmod: /etc/resolv.conf: Operation not permitted 25
  26. 26. AppArmor/SELinux Limit access to specific filesystem paths in container https://raw.githubusercontent.com/jessfraz/bane/master/docker‑nginx‑sample $ docker run --rm -ti --security-opt="apparmor:docker-nginx-sample" -p 80:80 nginx bash root@6da5a2a930b9:/# top bash: /usr/bin/top: Permission denied root@6da5a2a930b9:/# touch ~/thing touch: cannot touch 'thing': Permission denied 26
  27. 27. Attack a Container! “attack surface” Host <‑> Container Container <‑> Container External ‑> Container Application Security 27
  28. 28. Host <‑> Container Protecting the host from containers THREAT MITIGATION DoS Host (use up CPU, memory, disk), Forkbomb Cgroup controls, disk quotas (1.12), kernel pids limit (1.11 + Kernel 4.3) Access host/private information Namespace configuration; AppArmor/SELinux profiles, seccomp (1.10) Kernel modification/insert module Capabilities (already dropped); seccomp, LSMs; don’t run  -- privileged mode Docker administrative access (API socket access) Don’t share the Docker UNIX socket without Authz plugin limitations; use TLS certificates for TCP endpoint configurations 28
  29. 29. Container <‑> Container Malicious or Multi‑tenant THREAT MITIGATION DoS other containers (noisy neighbor using significant % of CPU, memory, disk) Cgroup controls, disk quotas (1.12), kernel pids limit (1.11 + Kernel 4.3) Access other container’s information (pids, files, etc.) Namespace configuration; AppArmor/SELinux profile for containers Docker API access (full control over other containers) Don’t share the Docker UNIX socket without Authz plugin limitations (1.10); use TLS certificates for TCP endpoint configurations 29
  30. 30. External ‑> Container The big, bad Internet THREAT MITIGATION DDoS attacks Cgroup controls, disk quotas (1.12), kernel pids limit (1.11 + Kernel 4.3), Proactive monitoring infrastructure/operational readiness Malicious (remote) access Appropriate application security model No weak/default passwords! ‑ ‑readonly filesystem (limit blast radius) Unpatched exploits (underlying OS layers) Vulnerability scanning (IBM Bluemix, Docker Data Center, CoreOS Clair, Red Hat “SmartState” CloudForms (w/Black Duck) 30
  31. 31. Application Security Significant container benefit: provided protections are in place (seccomp, LSMs, dropped caps, user namespaces) the exploited application has greatly reduced ability to inflict harm beyond container “walls” Proper handling of secrets through dev/build/deploy process (no passwords in Dockerfile, as an example) Unnecessary services not exposed externally (shared namespaces; internal/management networks) Secure coding/design principles 31
  32. 32. Thank You! 32

×