Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Docker

856 vues

Publié le

What happens when you start a container with docker?

Publié dans : Logiciels
  • Soyez le premier à commenter

Docker

  1. 1. ramichen@tencent.com
  2. 2. An old interview question • what happens when you open an website? • https://github.com/alex/what-happens-when
  3. 3. What happens when you start a container with docker?
  4. 4. A simple docker example root@boot2docker:/home/docker# ip ad show eth1 4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 08:00:27:91:99:33 brd ff:ff:ff:ff:ff:ff inet 192.168.59.103/24 brd 192.168.59.255 scope global eth1 valid_lft forever preferred_lft forever inet6 fe80::a00:27ff:fe91:9933/64 scope link valid_lft forever preferred_lft forever root@boot2docker:/home/docker# root@boot2docker:/home/docker# docker run -d -P redis 6f858e1563a56574031a61e65fb8ab356752d03440b24d65739eed64f2ef84df root@boot2docker:/home/docker# root@boot2docker:/home/docker# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 6f858e1563a5 redis:latest "/entrypoint.sh redi 3 seconds ago Up 2 seconds 0.0.0.0:49154->6379/tcp kickass_colden root@boot2docker:/home/docker# root@boot2docker:/home/docker# docker run -it --entrypoint /bin/bash redis root@63d30ea140b2:/data# redis-cli -h 192.168.59.103 -p 49154 192.168.59.103:49154> set k 123 OK 192.168.59.103:49154> get k "123"
  5. 5. What happened here • We created a container with its own filesystem, network stack, process space, resource limitation • We started a redis-server in the container. • We created another container. We ran redis-cli in it to connect to the preview redis-server with host ip and proxy port.
  6. 6. How this happened • What is a redis image? How to make it? • What is a container? How to make its own filesystem, network stack, process space, resource limitation? • How container starts?
  7. 7. How this happened • What is a redis image? How to make it? • What is a container? How to make its own filesystem, network stack, process space, resource limitation? • How container starts?
  8. 8. What is a redis image FROM dockerfile/ubuntu # Install Redis. RUN cd /tmp && wget http://download.redis.io/redis-stable.tar.gz && tar xvzf redis-stable.tar.gz && cd redis-stable && make && make install && cp -f src/redis-sentinel /usr/local/bin && mkdir -p /etc/redis && cp -f *.conf /etc/redis && rm -rf /tmp/redis-stable* && sed -i 's/^(bind .*)$/# 1/' /etc/redis/redis.conf && sed -i 's/^(daemonize .*)$/# 1/' /etc/redis/redis.conf && sed -i 's/^(dir .*)$/# 1ndir /data/' /etc/redis/redis.conf && sed -i 's/^(logfile .*)$/# 1/' /etc/redis/redis.conf # Define mountable directories. VOLUME ["/data"] # Define working directory. WORKDIR /data # Define default command. CMD ["redis-server", "/etc/redis/redis.conf"] # Expose ports. EXPOSE 6379
  9. 9. Image • A read-only Layer is called an image. An image never changes. • Each image may depend on one more image which forms the layer beneath it. We sometimes say that the lower image is the parent of the upper image. • Each image may depend on one more image which forms the layer beneath it. We say that the lower image is the parent of the upper image.
  10. 10. How this happened • What is a redis image? How to make it? • What is a container? How to make its own filesystem, network stack, process space, resource limitation? • How container starts?
  11. 11. How to make a image • Use dockerfile • Use docker commit manually (deprecated)
  12. 12. Create a root image • https://github.com/docker/docker/blob/master/ contrib/mkimage-busybox.sh • https://github.com/docker/docker/blob/master/ docs/articles/baseimages.md
  13. 13. How this happened • What is a redis image? How to make it? • What is a container? How to make its own filesystem, network stack, process space, resource limitation? • How container starts?
  14. 14. What is a container? • A Linux container is a copy of a Linux environment located in a file system which is jail environment but uses Linux NameSpaces, it runs its own init process, separate process space, separate filesystem and separate network stack which is virtualized by the root OS running on the hardware.
  15. 15. Concept of image and container • Docker image is a layer in the file system • Containers are two layers - Layer one is init layer based on image - Layer two is the actual container content 511136ea3c5a df7546f9f060 ea13149945cb 4986bf8c1536 142b6a3eae4 0 142b6a3eae4 0-init Container Image RW RO /dev /dev/console /dev/shm /etc /etc/hostname /etc/hosts /dev/mtab -> /proc/mounts
  16. 16. How this happened • What is a redis image? How to make it? • What is a container? How to make its own filesystem, network stack, process space, resource limitation? • How container starts?
  17. 17. Linux kernel Namespace • UTS(hostname), Mount(mount points), IPC(System V IPC), User(UIDs), Pid(processes), Net(network stack) • The kernel namespace API, clone, setns, unshare • /proc/[pid]/ns/ directory $ ls -l /proc/$$/ns lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 ipc -> ipc:[4026531839] lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 mnt -> mnt:[4026531840] lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 net -> net:[4026531956] lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 pid -> pid:[4026531836] lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 user -> user:[4026531837] lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 uts -> uts:[4026531838]
  18. 18. setns • reassociate process with a namespace • int setns(int fd, int nstype); • CLONE_NEWIPC/CLONE_NEWNET/CLONE_NEWNS/ CLONE_NEWPID/CLONE_NEWUSER/CLONE_NEWUTS • Each process has a /proc/[pid]/ns/ subdirectory containing one entry for each namespace that supports being manipulated by setns(2)
  19. 19. Join pid namespace func joinNS(namespaces []configs.Namespace) error { for _, ns := range namespaces { if ns.Path != "" { f, err := os.OpenFile(ns.Path, os.O_RDONLY, 0) if err != nil { return err } err = system.Setns(f.Fd(), uintptr(ns.Syscall())) f.Close() if err != nil { return err } } } return nil }
  20. 20. How this happened • What is a redis image? How to make it? • What is a container? How to make its own filesystem, network stack, process space, resource limitation? • How container starts?
  21. 21. Storage Driver • Docker implements vfs, aufs, device mapper, btrfs, overlayfs, zfs currently. • Storage driver should have the following feather - Copy on write - Shared memory cache • Performance http://developerblog.redhat.com/ 2014/09/30/overview-storage-scalability-docker/
  22. 22. Aufs • Work on File-level • Combine multiple branches in a specific order • Each branch is just a normal directory • Opening a file - look it up in each branch, starting from the top, open the first one if find - If attempts writing into it, copy it to the read-write (top) branch, then open the copy - That "copy-up" operation can take a while if the file is big! • Deleting a file - A whiteout file is created
  23. 23. Device Mapper
  24. 24. Device Mapper • Work on Block-level • Each container and each image gets its own block device • At any given time, it is possible to take a snapshot of a container or an image • data/metadata is sparse file • recommend to put data on real disk loop0 data metadata /dev/mapper/docker-{major}: {minor}-{indoor}-pool loop0 volume 1 volume 2
  25. 25. How to make its owner filesystem 1. mount every parent layer and rw layer diff/ $cid-init on mnt/$cid-init 2. make extra files, dir, links in mnt/$cid-init 3. mount every parent layer and rw layer diff/ $cid and ro layer diff/$cid-init on mnt/$cid 4. setns to join existing mount namespace 5. mount proc/sysfs/tmpfs/cgroup… 6. create devices, setup dev symlinks, init filesystem 7. chdir diff/$cid && chroot . note : underline parts made by initprocess, others made by docker daemon. more in rootfs_linux.go 511136ea3c5a df7546f9f060 ea13149945cb 4986bf8c1536 142b6a3eae4 0 142b6a3eae4 0-init /var/lib/docker/aufs/diff /var/lib/docker/aufs/mnt 142b6a3eae4 0
  26. 26. How this happened • What is a redis image? How to make it? • What is a container? How to make its own filesystem, network stack, process space, resource limitation? • How container starts?
  27. 27. Network mode • Docker supports bridge/none/container/host mode • How bridge mode work?
  28. 28. Bridge mode 1. create docker0 bridge, add eth1 to docker0, set up docker0 iptable rule 2. create a veth device, attach one to docker0, put another into container’s network namespace. 3. allocate a free ip 4. set up iptable rules and userland proxy 5. setns to join existing network namespace 6. change the name of veth device to eth1 in container 7. set mac address, ip, mtu of veth device 8. set up default gateway and route note : underline parts made by initprocess, others made by docker daemon. host eth1 10.27.149.90 docker0 172.17.42.1 contianer0 eth1 172.17.0.4 vethdb6e696 contianer1 eth1 172.17.0.5 veth8df64b7 veth device bridge physical device
  29. 29. Consistent mac address • Docker generates mac addresse for veth device consistent for a given ip address. • This can avoid arp cache issues func generateMacAddr(ip net.IP) net.HardwareAddr {
 hw := make(net.HardwareAddr, 6)
 
 // The first byte of the MAC address has to comply with these rules:
 // 1. Unicast: Set the least-significant bit to 0.
 // 2. Address is locally administered: Set the second-least-significant bit (U/L) to 1.
 // 3. As "small" as possible: The veth address has to be "smaller" than the bridge address.
 hw[0] = 0x02
 
 // The first 24 bits of the MAC represent the Organizationally Unique Identifier (OUI).
 // Since this address is locally administered, we can do whatever we want as long as
 // it doesn't conflict with other addresses.
 hw[1] = 0x42
 
 // Insert the IP address into the last 32 bits of the MAC address.
 // This is a simple way to guarantee the address will be consistent and unique.
 copy(hw[2:], ip.To4())
 
 return hw
 }
  30. 30. Port Mapping • Docker daemon use a map to record ports and ip mappings • Connect to local subset - userland proxy: docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 49153 - container-ip 172.17.0.2 -container-port 6379 - Hairpin nat (new docker versions) - enable /sys/class/net/$vethname/brport/hairpin_mode • Connect to others - iptables -I POSTROUTING -t nat -s 172.17.42.1/16 ! -o docker0 -j MASQUERADE - iptables -t nat -A DOCKER -p tcp -d 0/0 --dport 49153 ! -i docker0 -j DNAT --to- destination 172.17.0.2:6379
  31. 31. How this happened • What is a redis image? How to make it? • What is a container? How to make its own filesystem, network stack, process space, resource limitation? • How container starts?
  32. 32. Cgroups support by docker • cgroup components: cpuset, cpu, cpuacct, memory, devices, freezer, net_cls, blkio • docker run option: --memory, --cpuset, --cpu- shares, --device • docker pause/unpause • After start background “docker native” process, docker daemon echo the pid of it to cgroup dirs like /cgroup/memory/docker/$cid/memory.limit_in_bytes
  33. 33. How this happened • What is a redis image? How to make it? • What is a container? How to make its own filesystem, network stack, process space, resource limitation? • How container starts?
  34. 34. How container starts 1. creates a socketpair and starts a background child process “docker native” 2. create network devices and applies cgroup settings. 3. send configuration to “docker native” 4. receive error message, wait for “docker native” to exit 5. “docker native” receive config and env from socketpair 6. “docker native” join existing namespace with fd in /proc/$pid/ns/* 7. init file system… 8. exec entrypoint “docker native” is the init process in container daemon docker native entrypoint start config errors exec client startcreate
  35. 35. Reference • Docker image specification • Linux container • Deep dive into Docker storage drivers • Docker Architecture (v1.3) • Hairpin_NAT • Linux Programmer's Manual NAMESPACES

×