There is no container - Ori Pekelman

193 vues

Publié le

Retrouvez la présentation d'Ori Pekelman de Platform.sh lors de la conférence du Paris Container Day

Publié dans : Technologie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

There is no container - Ori Pekelman

  1. 1. #ContainerDayFR There is no container
  2. 2. #ContainerDayFRParis Container Day 2017 Ori Pekelman GeekPush at Platform.sh I am @OriPekeman everywhere (github/twitter/LinkedIn) Co-Founder & VP of Marketing for Platform.sh, an innovative second generation PaaS. My role usually spans beyond the technological aspects to the business strategy, process design and product marketing. There is no container 2
  3. 3. #ContainerDayFRParis Container Day 2017 We are in Paris Containers Day, so I could rightly imagine most people around have an understanding of the underpinnings of “containers”. But let’s have a show of hands to see how much time we are going to spend on which slide. There is no container 3 Group A I don’t know much about containers. It sounds interesting. I came here to learn.
  4. 4. #ContainerDayFRParis Container Day 2017 We are in Paris Containers Day, so I could rightly imagine most people around have an understanding of the underpinnings of “containers”. But let’s have a show of hands to see how much time we are going to spend on which slide. There is no container 4 Group B I use Docker. In production. It works and I never had to care about how it is implemented.
  5. 5. #ContainerDayFRParis Container Day 2017 We are in Paris Containers Day, so I could rightly imagine most people around have an understanding of the underpinnings of “containers”. But let’s have a show of hands to see how much time we are going to spend on which slide. There is no container 5 Group C I implement my own container stuff. I have Kernel-Fu. I know how this stuff is built.
  6. 6. #ContainerDayFRParis Container Day 2017 1. This is meant as an entry-level talk, I will still discuss some nuts and bolts.. so when I am unclear. Interrupt me. I don’t mind. 2. I am rusty. They make me do marketing these days. So when I am wrong. Interrupt me. I don’t mind. 3. Even more so as we have the incredible honor of having people like Jessie Frazelle with us, people that participated in building many of the nuts and some of the bolts. So, please, Jessie and you other experts, forgive the depths of my ignorance and any and all lies and errors I am about to spout. There is no container 6
  7. 7. #ContainerDayFRParis Container Day 2017 What do containers solve? Why do we need containers? There is no container 7 Containers allow us to package complex software in a reusable format that is easy to deploy, making automation easier. Sometimes they make updating software easier (with stateless systems… just build a new one, kill the old). They have lower overhead in terms of memory usage than VMs, so they are less expensive.. and we can have more of them. They allow us to reason about the systems we run at lesser granularity. AKA abstraction. In greek Atom means - that which cannot be divided. The container is our Atom.
  8. 8. #ContainerDayFRParis Container Day 2017 There is no container 8
  9. 9. #ContainerDayFRParis Container Day 2017 There is no container 9
  10. 10. #ContainerDayFRParis Container Day 2017 There is no container 10
  11. 11. #ContainerDayFRParis Container Day 2017 There is no container 11
  12. 12. #ContainerDayFRParis Container Day 2017 There is no container 12
  13. 13. #ContainerDayFRParis Container Day 2017 The canonical image of the container is something like There is no container 13
  14. 14. #ContainerDayFRParis Container Day 2017 An orderly world where we put software in opaque boxes There is no container 14
  15. 15. #ContainerDayFRParis Container Day 2017 The boxes have a common, simple interface, that is not influenced by their content There is no container 15
  16. 16. #ContainerDayFRParis Container Day 2017 From the outside we don’t care what is inside. There are no dependencies on the exterior world. There is no container 16
  17. 17. #ContainerDayFRParis Container Day 2017 That is our intuitive abstraction popularized by Docker™ There is no container 17
  18. 18. #ContainerDayFRParis Container Day 2017 We can move containers. Install them. Run them. Without ever knowing what was inside. There is no container 18 $ docker pull complex_piece_of_software:latest $ docker run complex_piece_of_software:latest
  19. 19. #ContainerDayFRParis Container Day 2017 The “Nuts and bolts” truth of the matter is probably inverse. The container does not create opacity from the outside in. There is no container 19
  20. 20. #ContainerDayFRParis Container Day 2017 But from the inside out. There is no container 20
  21. 21. #ContainerDayFRParis Container Day 2017 From the system’s point of view There is no container 21
  22. 22. #ContainerDayFRParis Container Day 2017 This is the reality There is no container 22
  23. 23. #ContainerDayFRParis Container Day 2017 From the outside, the kernel, UID 0, they see all. For them, there is no container. There is no container 23
  24. 24. #ContainerDayFRParis Container Day 2017 There is no container It is from the“containerized” process point of view that the world changes. Becomes smaller. 24
  25. 25. #ContainerDayFRParis Container Day 2017 When we create a container what happens is that using a bunch of different Kernel features and modules (cgroups, namespaces, seccomp...) we: There is no container 25
  26. 26. #ContainerDayFRParis Container Day 2017 Limit the visibility on the outside world (namespaces) There is no container 26
  27. 27. #ContainerDayFRParis Container Day 2017 Limit the availability of resources from the outside world (cgroups) There is no container 27
  28. 28. #ContainerDayFRParis Container Day 2017 Sometimes outright lie about the world (namespaces) There is no container 28
  29. 29. #ContainerDayFRParis Container Day 2017 And we limit the capabilities of the process in what it can invoke as functionalities from the Kernel (seccomp .. and more…) There is no container 29
  30. 30. #ContainerDayFRParis Container Day 2017 There is no container 30
  31. 31. #ContainerDayFRParis Container Day 2017 There is an operating system. In our case Linux. It abstracts away the hardware. No software on a normal computer runs “outside” of the operating system. Yup. Even assembly / machine code. You can’t access the processor, memory or hardware without going through it. What you run on Linux are ELF binaries. Nothing else. Your program interacts with its operating system through System Calls, it cas ask for memory, access to stuff (like the network or the disk), it can ask the operating system to run some other processes. A bunch of fun stuff. So.. let’s create a container. There is no container 31
  32. 32. #ContainerDayFRParis Container Day 2017 So.. let’s create a container. There is no container 32 Interactions with the OS pass through system calls.. but sometimes it gets fancy and proposes higher-level constructs to make it easy (like a pseudo-file-system). Most often we will use libraries and full-blown integrated apps to take care of talking to the OS. More on that later. In Linux processes are organized in a tree. Each process has an ID, and a parent; Everything starts with 0 which is the scheduler and 1, which is init. Everything else is going to get invoked from those and down. In linux we have three different calls to start a process exec() which we don’t really care about here. fork() which copies the current process with a new PID and clone() that copies all or some of the current process and runs the new process as a child.
  33. 33. #ContainerDayFRParis Container Day 2017 So, how do we make the world seem smaller to a process? When creating our process we can pass a couple of parameters to clone() that will tell our operating system how it is going to live. A bunch of these parameters (or flags) are called CLONE_NEW[...SOMETHING….] Some of these parameters, not all, can be modified later-on using the unshare() system call. So.. let’s create a container. There is no container 33
  34. 34. #ContainerDayFRParis Container Day 2017 So.. let’s create a container. There is no container 34 For example the parameter CLONE_NEWUTS tells the operating system that: 1. Our newly created process can call sethostname() and that doing so, instead of changing the hostname for the whole OS, it is going to keep a record, just for that Namespace of the Host Name. 2. So when, later the process calls gethostname() it will return whatever was put through this namespace’s sethostname(). So unlike all of its cousins and parents this process thinks the name of the machine it is running on is different. We tricked it! (remember the part about lying?)
  35. 35. #ContainerDayFRParis Container Day 2017 Setting up namespaces There is no container 35 So.. we create a new process, and we attach a namespace to it, either at its creation with the flags we pass to clone(), later using the unshare() system call, that can change some of the namespaced resources or using the setns() system call that would set a namespace for an existing process.
  36. 36. #ContainerDayFRParis Container Day 2017 So.. let’s create a container. There is no container 36 Having a different machine name per process is cool. But not that useful right? That is not a container. What else can we isolate?
  37. 37. #ContainerDayFRParis Container Day 2017 Isolating the file-system There is no container 37 As far as containers are concerned the most important thing is the file-system. This is done through CLONE_NEWNS. 1. First we create the new mount namespace 2. We can than unmount the stuff from the parent namespace and mount the various things we need to mount in our target dir (we want to get to a usable root file system). 3. Run `pivot_root $TARGETDIR` and voilà! We can have different mounts and isolate parts of the file-system! As a side note, doing stuff like mounting, requires “capabilities” in this case CAP_SYS_ADMIN. More often than not these are going to have been dropped. So this is not always trivial.
  38. 38. #ContainerDayFRParis Container Day 2017 So.. let’s create a container. There is no container 38 We can decide what mounts are going to be shared from the “host”. We can totally decide that /var/lib is going to be common. Nothing disallows this. We can use some crazy layered file system (like AUFS or OverlayFS) which will allow us to mix stuff, some coming from the underlying OS and some ‘overridden’ just for our namespace. Now, “container runtimes” like Docker, or LXC or runc are a lot about preparing an image of a filesystem that can be mounted in a way that a process could run. If you look at the OCI (open container initiative) it has two specs, one for this, the file system, and one for the runtime.
  39. 39. #ContainerDayFRParis Container Day 2017 Isolating Inter-Process Communications There is no container 39 With CLONE_NEWIPC we limit our processes capability to send and receive messages from processes to others with the same namespace; We don’t want our nice isolated process to talk with strangers right?
  40. 40. #ContainerDayFRParis Container Day 2017 This is how when you run ps -aux you only see processes in your own namespace and its children (the pids won’t match. This is complex). Oops, I forgot to tell you, namespaces are hierarchical. Which is triple fun. So yes containers can run inside other containers ad-infinitum (really up to 32 levels, but, well, you know, details). Isolate Process IDs! There is no container 40
  41. 41. #ContainerDayFRParis Container Day 2017 This is how your container gets its own IP. Yay, now is it a big boy. (We won’t get into this.. but this is also where a lot of suffering will happen. Remember, from the Kernel perspective this is just another interface. We will need either to use NAT, weird bridging or some creative uses of IPTABLES to make sense thing. And this is clearly where we see how higher-level abstractions are a necessity) Isolating the Network There is no container 41
  42. 42. #ContainerDayFRParis Container Day 2017 This is oh so important for unprivileged containers. Yes! Linux supports doing all of this from userspace. This basically means that the uid running inside does not exist outside. And that your process can feel blessedly aloof. Isolate User and group IDs There is no container 42
  43. 43. #ContainerDayFRParis Container Day 2017 man namespaces USER_NAMESPACES(7) There is no container 43 A process's user and group IDs can be different inside and outside a user namespace. In particular, a process can have a normal unprivileged user ID outside a user namespace while at the same time having a user ID of 0 inside the namespace; in other words, the process has full privileges for operations inside the user namespace, but is unprivileged for operations outside the namespace. This means quantum-state rootness! You are root and unprivileged at the same time!
  44. 44. #ContainerDayFRParis Container Day 2017 man namespaces USER_NAMESPACES(7) There is no container 44 Each process is a member of exactly one user namespace. A process created via fork or clone without the CLONE_NEWUSER flag is a member of the same user namespace as its parent. A single-threaded process can join another user namespace with setns if it has the CAP_SYS_ADMIN in that namespace; upon doing so, it gains a full set of capabilities in that namespace.
  45. 45. #ContainerDayFRParis Container Day 2017 This is where this ties in to the earlier mechanism we were talking about, cgroups. CLONE_NEWCGROUP basically allows us to limit the resource usage of the process (and its children), in terms of memory, CPU usage and IO. Almost last, but not least. Isolate resources! There is no container 45
  46. 46. #ContainerDayFRParis Container Day 2017 This is of unholy complexity. Short story: Linux used to be mostly all or nothing . User 0 Vs the others. Now you have capabilities. A long list of capabilities. Which you can now go and set per process. And you have stuff like seccomp and seccomp-bpf to help you do just that And you can use a bunch of modules and kernel patch sets to make everything more robust. Like SELinux. GRSecurity. Or AppArmor. Really last: isolate all the things and the Kernel. There is no container 46
  47. 47. #ContainerDayFRParis Container Day 2017 seccomp There is no container 47 Seccomp is a mechanism in the Linux kernel that allows a process to make a one-way transition to a restricted state where it can only perform a limited set of system calls. If a process attempts any other system calls, it is killed via a SIGKILL signal. In its most restrictive mode, seccomp prevents all system calls other than read(), write(), _exit(), and sigreturn(). This would allow a program to initialize and then drop into a restricted mode where it could only read from/write to already-opened files.
  48. 48. #ContainerDayFRParis Container Day 2017 seccomp-bpf There is no container 48 If seccomp is a sledgehammer. seccomp-bpf is the fine-grained version that allows specifying a filter that is applied to every system call.
  49. 49. #ContainerDayFRParis Container Day 2017 BTW You get to have a nice pseudo filesystem with which you can interact to control these values. try: sudo ls -lai /proc/8/ns/ cat /proc/800/cgroups Looking under the hood There is no container 49
  50. 50. #ContainerDayFRParis Container Day 2017 Unlike other isolation techniques (Solaris Zones, BSD Jails, VMs) this is an emergent thing There is no container 50 This is not a “first class” citizen. This was not designed. Different projects assemble different types of isolation that have different semantics from all of these elements. ● Docker is about packaging a single executable ● LXC wants to give you what feels like a virtual machine. ● FireJail is there as a sandbox to run stuff you don’t trust. GUI much. And this is a recent thing, user namespaces appeared in Kernel release 3.8 on 18 Feb 2013
  51. 51. #ContainerDayFRParis Container Day 2017 There is no container 51
  52. 52. #ContainerDayFRParis Container Day 2017 Quite far away from our intuitive abstraction popularized by Docker™ There is no container 52
  53. 53. #ContainerDayFRParis Container Day 2017 Everything is this world is “race-condition” prone and much of it, because of the mix of tooling is complex and hard. Creating a Linux Container or “containerization” is using these different mechanisms together in a coherent way so as to have the end result “feel” as if the process you are running in an isolated machine. A container runtime is a packaging of the above to make it simple. The signatures and semantics of cgroups, namespaces and seccomp are different. There is no container 53
  54. 54. #ContainerDayFRParis Container Day 2017 Container runtimes, try to take something that more reliably looks like this There is no container 54
  55. 55. #ContainerDayFRParis Container Day 2017 Into our abstract image There is no container 55
  56. 56. #ContainerDayFRParis Container Day 2017 When you think about all these low-level knobs we can control: the machine name, the network interfaces, the file-system, the users etc… you see something else emerging. When we define how to “containerize” a piece of software we are extracting its contract. We are defining the minimal subset of resources it needs. And what is the minimal understanding of that piece of software that the runtime requires to reliably run it. Containers as an abstraction There is no container 56
  57. 57. #ContainerDayFRParis Container Day 2017 There were other isolation techniques before Docker. But because it exposed such a simple contract it gained the incredible traction it had. According to Docker the contract of a piece of software was: ● A base image (a state of a file-system). Itself can be layered. ● A working directory. ● A build step (which was basically a bash script). ● A TCP port exposed to the world. ● Environment variables. ● A command to run. The simple Docker Contract There is no container 57
  58. 58. #ContainerDayFRParis Container Day 2017 The incredible success it had shows the Docker software, and the Docker contract were good enough; And good enough is good. Sometimes great. At platform.sh we run a container based based PaaS and we chose not to use Docker. ● Partly because the nuts-and-bolts at the time didn’t fit (it was too new/buggy for production in 2013/14). No User namespaces until two months ago. No Immutability. Weird networking. ● Partly because we thought the contract wasn’t correct for our use-case. Choosing a contract There is no container 58
  59. 59. #ContainerDayFRParis Container Day 2017 ● The idea of mutable, layered, base-images made creating the first generation of Docker containers easy. Which explains a lot of its popularity. So yes. ● But it is a messy thing. This is something Docker has advanced on by allowing immutable containers. Still the default is that the container is mutable. And this is how the eco-system looks like. ● Build-oriented, reproducible, semantic base-images allow for orders of magnitude better memory utilisation through deduplication; And order of magnitude simpler operations. This is not something you can bolt-on easily later. There is still strong inertia here. Is it an efficient contract? There is no container 59
  60. 60. #ContainerDayFRParis Container Day 2017 For some software (most software we cared about) this contract doesn’t really make sense. Not in the long run. Not at scale. In order to be useful the contract that describes software needed also to describe: ○ How to build it ○ Everything it depends on (you can’t run Wordpress without MySQL) ○ Its initial data structures (you can’t run Wordpress without some data in the MySQL) ○ Its basic configuration (most software needs to understand some things about its place in the world) There is no container 60
  61. 61. #ContainerDayFRParis Container Day 2017 ○ And first, of-course, the Kubernetes ecosystem. ○ But using 30 different tools strung together doesn’t scream “abstraction” to us, but more like DIY mess. And it hardly answers the questions: ■ What is the minimal subset of resources an app needs? ■ How can we make it run, reliably? These days there are a billion and one projects that add those capabilities There is no container 61
  62. 62. #ContainerDayFRParis Container Day 2017 The obligatory XKCD 435 There is no container 62 ○ If our intuition is correct, and the minimal viable contract to run “arbitrary” software contains these other things, if the useful level to reason about software is the molecule, not the atom then we need an Organic Chemistry set; Not a physics set. ○ It doesn’t mean physics are wrong. Or that Docker is bad software.
  63. 63. #ContainerDayFRParis Container Day 2017 ● RO / immutable base-image that is not opaque ○ A semantic representation of system-libraries (with lock files) ○ A reproducible, semantic, build system (with lock files) ○ Potentially, a build step (which can basically be a bash script). ● RW / mutable base-image (mutable state) - which is Content Addressable ● Mapping of working directories to the RW image. ● A list of exposed network protocols and their parameters ● Build time environment variables / Run time environment variables ● Relationships (some containers make no sense -- would not run without a database) to other containers (that should be semantic themselves). ● The capability to understand change (diff as part of the model). What would be a perfect contract for us? There is no container 63
  64. 64. #ContainerDayFRParis Container Day 2017 ● Because we chose a container description system that did not depend on the containerization method we can swap-out that part later and this is domain where everything moves fast. Shiny new becomes legacy in 6 months. ○ Our reproducible build system can create our base LXC systems (we use in production) our VMs (which we also deploy when we need higher levels of isolation) or Docker images (which we use in our Gitlab based CI system). ● Because we went for Read-On Containers separated from the R/W mounts we have gained factors in terms of density because of the level of memory deduplication. Why are abstractions important? There is no container 64
  65. 65. #ContainerDayFRParis Container Day 2017 Why are abstractions important? ● Because we are describing the “minimal application” not as a single process but as a graph.. and because we understand the protocol layer interactions … and what writes where to disk .. we can have consistent operations over the cluster that are fast .. and safe. ● Which also means we do not suffer from the same limitations around running persistent services. ● It is easier to implement HA primitives when you understand who is writing to the disk and how, who has what ports opened etc.. ● When your base system is not .yaml but .yaml + git and when your .yaml represents something that has meaning.. you can implement change with much less friction. There is no container 65
  66. 66. Platform.sh can clone a an arbitrarily complex production cluster in less than a minute. With all of the data. To create ephemeral staging clusters on the fly. Every branch gets a url with basically fail-proof deployments.
  67. 67. Git-driven infrastructure With a single git push you can deploy an arbitrarily complex cluster (with micro-services, messages queues and the lot.) Backup means a consistent point-in-time snapshot of the whole shebang.
  68. 68. Automatically redundant architecture High-Performance, automatic high-availability
  69. 69. #ContainerDayFRParis Container Day 2017 There is no container 69
  70. 70. #ContainerDayFRParis Container Day 2017 There is no container but the cluster There is no container 70 ● This is a bonus slide in case I didn’t run-out-of-time which is fun as I had 66 slides for 30 minutes. ● At the beginning of our project we used the word Cluster to describe, well half of the different primitives we had. But then it all became murky. So we started calling stuff Cluster, Kluster and Claster. Which stuck for a little bit but faded back again. ● Now cluster is back with all its glory, and a bit like with Hebrew, my mother’s tongue.. well, people seem just to be able to guess the correct meaning of cluster form the context. ● Oh we should really refresh that cluster.
  71. 71. #ContainerDayFRParis Container Day 2017 I am @OriPekelman everywhere There is no container 71 Questions ?

×