Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Techtalks: taking docker to production
1. Taking Docker To production
JOSA TechTalk by Muayyad Saleh Alsadi
http://muayyad-alsadi.github.io/
2.
3. What is Docker again? (quick review)
Containers
uses linux kernel features
like:
● namespaces
● cgroups (control
groups)
● capabilities.
Platform
Docker is a key
component of many
PaaS. Docker provide a
way to host images, pull
them, run them, pause
them, snapshot them into
new images, view diffs ..
etc.
Ecosystem
Like github, Docker Hub
provide publicly available
community images.
4. Containers vs. VMs
No kernel in guest OS (shared with host)
containers are more secure and isolated than chroot and less isolated than VM
5. Why DevOps?
Devs
want change
Ops
wants stability (no
change)
DevOps
resolve the conflict.
for devs: docker image
contains the same os,
same libraries, same
version, same config, ...
etc.
for admins: host is
untouched and stable
Blame each other
Fight each other
6. Devs Heaven (not for production)
docker compose can bring everything up and connect them and link them with a
single command. can mount local dir inside the image (so that developer can use
his/her favorite IDE). The command is
docker-compose up
it will read “docker-compose.yml” which might look like:
mywebapp:
image: mywebapp
volumes:
- .:/code
links:
- redis
redis:
image: redis
7. Operations Heaven
Having a stable host!
CoreOS does not include any package manager. and does
not even have python or tools installed. They have a Fedora
based docker image called toolbox.
You can mix and match. Some containers runs Java 6 or
Java 7. Some uses CentOS 6, others 7, others ubuntu 14.04
others Fedora 22 ..etc. in the same host.
8. Linking Containers
docker run -d --name r1 redis
docker run -d --name web --link r1:redis myweb
r1 is container name
redis is link alias
it will update /etc/hosts and set ENVs:
● <alias>_NAME = <THIS>/<THAT> # myweb/r1
● REDIS_PORT=<tcp|udb>://<IP>:<PORT>
● REDIS_PORT_6379_TCP_PROTO=tcp
● REDIS_PORT_6379_TCP_PORT=6379
● REDIS_PORT_6379_TCP_ADDR=172.17.1.15
9. Pets vs. Cattle vs.Ants
Pets (virtualization)
The VM has
● lovely distinct
names
● emotions
● many highly coupled
roles
● if down it’s a
catastrophe
Cattle (cloud)
● no names
● no emotions
● single role
● decoupled (loosely
coupled)
● load-balanced
● if down other VMs take
over.
● VM failure is planned and
part of the process
Ants (docker containers)
containers are like cloud
vms, no names no
emotions, load balanced.
A single host (might be a
VM) is highly dense. The
host is stable. Large
group of containers are
designed to fail as part of
the process.
10. What docker is not
● docker is not a hypervisor
○ docker is for process containers not system containers
○ example of system containers: LXD and OpenVZ
● no systemd/upstart/sysvinit in the container
○ docker is for process containers not system containers
○ just run apache, nginx, solr, whatever
○ TTYs are not needed
○ crons are not needed
● Docker is not for multi-tenant
HINT: LXD is stupid way of winning a meaningless benchmark
12. Docker golden rules
by twitter@gionn:
● only one process per image
● no embedded configuration
● no sshd, no syslog, no tty
● no! you don't touch a running container to adjust things
● no! you will not use a community image
13. Theory vs. Reality
docker imaginary “unicorn” apps
● statically compiled (no
dependencies)
● written in golang
● container ~ 10MB
on real world
● interpreted application (python,
php)
● system dependencies, config
files, log files
● multiple processes (nginx, php-
fpm)
● container image >500MB
14. 12 Factor - http://12factor.net/
1. One codebase (in git), many deploys
2. Explicitly declare and isolate dependencies
3. get config from environment or service discovery
4. Treat backing services as attached resources (Database, SMTP, S3, ..etc.)
5. Strictly separate build and run stages (no minify css/js on run stage)
6. Execute the app as one or more stateless processes (data and state are
persisted elsewhere apart from the app, no need for sticky sessions)
7. Export a port (an end point to talk to)
8. Scale out via the process model
9. Disposability: Maximize robustness with fast startup and graceful shutdown
10. Keep development, staging, and production as similar as possible
11. Logs: they are flow of events written to stdout that is captured by execution
env.
15. 12 Factor
last factor is administrative processes
● Run admin/management tasks as one-off processes
○ in django: manage.py migrate
● One-off admin processes should be run in an identical
environment as the regular long-running processes of the
app
● shipped from same code (same git repo)
Example of 12 Factor: bedrock - a 12 factor wordpress
https://roots.io/bedrock/
16. 12 Factor - Factorish
can be found on https://github.com/factorish/factorish
example:
https://github.com/factorish/factorish-elk
17. Config
● confd
○ written in go (a statically linked binary)
○ input
■ env variables
■ service discovery (like etcd and consul)
■ redis
○ output
■ golang template with {{something}}
● crudini, jq
● http://gliderlabs.com/registrator/latest/user/quickstart/
18. Config
● container’s entry point (“/start.sh”) calls REST API to add
itslef to haproxy or anyother loadbalancer
● container’s entry point uses discovery service client (ex.
etcdctl)
● something listen to docker events and send each
container ENV and labels to discovery service
27. Building Docker Images
● Dockerfile and “docker build -t myrepo/myapp .”
○ I have a proposal using pivot root inside dockerfile
(docker build will build the build environment then use another fresh small
container as target, copy build result and pivot). Docker builder is frozen
but details are here
● Dockramp
○ https://github.com/jlhawn/dockramp
○ external builder written in golang
○ uses only docker api (needs new “cp” api)
○ can implement my proposal
● Atomic app / Nulecule/ openshift have their ownway
● Use Fabric/Ansible to build
29. Seriously?
Docker on production!
“Docker is about running random
code downloaded from the Internet
and running it as root.”[1][2]
-- a redhat engineer
Source 1, source 2
30. ● host a private docker registry (so you don’t download
random code from random people on internet)
● use HTTPS and be your own certificate authority and trust it
on your docker hosts
● use registry version 2 and apply ACL on images
○ URLs in v2 look /v2/<name>/blobs/<digest>
● use HTTP Basic Auth (apache/nginx) with whatever back-
end you like (ex. LDAP or just plain files)
● have a Read-Only user as your “deployer” on servers
● have a build server to push images (not developers)
Host your own private registry
31. “Containers do not contain.”
-- Dan Walsh (Redhat / SELinux)
Seriously?
Docker on production!
32. in may 2015, a catastrophic
vulnerability affected kvm/xen
almost every datacenter.
Fedora/RHEL/CentOS had been
secure because of SELinux/sVirt
(since 2009)
AppArmor was a joke that is not
funny.
http://www.zdnet.com/article/venom-security-flaw-millions-of-
virtual-machines-datacenters/
https://fedoraproject.
org/wiki/Features/SVirt_Mandatory_Access_Control
Docker and The
next Venom?
sVirt do support
Docker
What happens in a container
stays in the container.
33. ● Drop privileges as quickly as possible
● Run your services as non-root whenever possible
○ apache needs root to open port 80, but you are going to
proxy the port anyway, so run it as non-root directly
● Treat root within a container as if it is root outside of the
container
● do not give CAP_SYS_ADMIN to a container (it’s equivalent
to host root)
Recommendations
34. Setting proper storage backend
● docker info | grep ‘Storage Driver’
● possible drivers/backends:
○ aufs: a union filesystem that is so low quality that was never part of official linux kernel
○ overlay: a modern union filesystem that was accepted in kernel 4.0 (too young)
○ zfs: linux port of the well-established filesystem in solaris. the quality of the port and driver is still
questionable
○ btrfs: the most featureful linux filesystem. too early to be on production
○ devicemapper (thin provisioning): well-established redhat technology (already in production ex.
LVM)
● do not use loopback default config in EL (RHEL/CentOS/Fedora)
○ WARNING: No --storage-opt dm.thinpooldev specified, using loopback; this configuration is
strongly discouraged for production use
● in EL edit /etc/sysconfig/docker-storage
● http://developerblog.redhat.com/2014/09/30/overview-storage-scalability-docker/
● http://www.projectatomic.io/blog/2015/06/notes-on-fedora-centos-and-docker-storage-drivers/
● http://www.projectatomic.io/docs/docker-storage-recommendation/
35. Storage backend (using script)
man docker-storage-setup
vim /etc/sysconfig/docker-storage-setup
docker-storage-setup
● DEVS=“/dev/sdb /dev/sdc”
○ list of unpartitioned devices to be used or added
○ if you are adding more, remove old ones
○ required if VG is specified and does not exists
● VG=“<my-volume-group>”
○ set to empty to use unallocated space in root’s VG
36. Storage backend (manual)
pvcreate /dev/sdc
vgcreate direct-lvm /dev/sdc
lvcreate --wipesignatures y -n data direct-lvm -l 95%VG
lvcreate --wipesignatures y -n metadata direct-lvm -l 5%VG
dd if=/dev/zero of=/dev/direct-lvm/metadata bs=1M
vim /etc/sysconfig/docker-storage # to add next line
DOCKER_STORAGE_OPTIONS = --storage-opt dm.metadatadev=/dev/direct-
lvm/metadata --storage-opt dm.datadev=/dev/direct-lvm/data
systemctl restart docker
37. Docker Volumes
Never put data inside the container (logs, database files, ..etc.). Data should go to
mounted volumes.
You can mount folders or files. You can mount RW or RO.
You can have a busybox container with volumes and mount all volumes of that
container in another container.
# docker run -d --volumes-from my_vols --name db1 training/postgres
38. Everything is a child
processes of a single
daemon. Seriously!
Seriously?
Docker on production!
39. Docker process model is flawed
Docker daemon launches containers as attached child processes. if the daemon
dies all of them will collapse in a fatal catastrophe. Moreover, docker daemon has
so many moving parts. For example fetching images is done inside the daemon.
Bad network while fetching an image or having an evil image might collapse all
containers.
https://github.com/docker/docker/issues/15328
An evil client, an evil request, an evil image, an evil contain, or an evil “inspect”
template might cause docker daemon to go crazy and risk all containers.
40. Docker process model is flawed
CoreOS introduced more sane process model in rkt (Rocket) an alternative
docker-like containers run time. RedHat contributes to both docker and rocket as
both has high potential. Rkt is just a container runtime where you can run
containers as non-root and without being a child to anything (ex. rely on
systemd/D-Bus). Rocket is not a platform (no layers, no image registry service, ..
etc.)
https://github.com/coreos/rkt/
Docker might evolve to fix this, dockerlite is a shell script uses LXC and BTRFS
https://github.com/docker/dockerlite
For now just design your cluster to fail and use anti-affinity
42. Docker Networking now
Docker uses Linux bridges which only connect within same host.
Containers on host A can’t talk to container on host B! And uses NAT to talk to
outside world
# iptables -t nat -A POSTROUTING -s 172.17.0.0/16 -j MASQUERADE
Exported ports in docker are done via a docker proxy process (written in go).
check “netstat -tulnp”
Deprecated geard used to connect multiple hosts using NAT and configured each
container to talk to localhost for anything (ex. talk to localhost MySQL and NAT will
take it to MySQL container on another host):
# iptables -t nat -A PREROUTING -d ${local_ip}/32 -p tcp -m tcp --dport ${local_port} -j DNAT --to-destination
${remote_ip}:${remote_port}
# iptables -t nat -A OUTPUT -d ${local_ip}/32 -p tcp -m tcp --dport ${local_port} -j DNAT --to-destination
${remote_ip}:${remote_port}
# iptables -t nat -A POSTROUTING -o eth0 -j SNAT --to-source ${container_ip}
43. Docker Networking now
A Similar approach is manually hard-code and divide docker bridges on each host
172.16.X.y and where X is the host and y is the container and use NAT to deliver
packets (or 172.X.y.y depending on number hosts and number of containers on
each host).
http://blog.sequenceiq.com/blog/2014/08/12/docker-networking/
given a remote host with IP 192.168.40.12 and its docker0 bridge with 172.17.52.0
/24, and given a host with docker0 on 172.17.51.0/24 in the later host type
route add -net 172.17.52.0 netmask 255.255.255.0 gw 192.168.40.12
iptables -t nat -F POSTROUTING# or pass "--iptables=false" to docker daemon
iptables -t nat -A POSTROUTING -s 172.17.51.0/24 ! -d 172.17.0.0/16 -j MASQUERADE
44. Docker Networking Alternatives
● OpenVSwitch (well-established production technology)
● Flannel (young project from CoreOS written in golang)
● Weave (https://github.com/weaveworks/weave)
● Calico (https://github.com/projectcalico/calico)
45. Docker Networking Alternatives
OpenVSwitch:
Just like a physical, this virtual
switch connects different hosts.
One setup would be connecting
each container to OVS without
bridge. “docker run --net=none”
then use ovs-docker script
The other setup just replace
docker0 bridge with one that is
connected to OVS. (no change
need to be done to each
container)
46. Docker Networking Alternatives
# ovs_vsctl add-br sw0
or /etc/sysconfig/network-scripts/ifcfg-sw0
then
# ip link add veth_s type veth
peer veth_c
# brctl addif docker0 veth_c
# ovs_vsctl add-port sw0 veth_s
see /etc/sysconfig/network-scripts/ifup-ovs
http://git.openvswitch.org/cgi-bin/gitweb.cgi?
p=openvswitch;a=blob_plain;f=rhel/README.
RHEL;hb=HEAD
47. Networking the future
in the feature libnetwork will allow docker to use SDN plugins.
Docker acquired SocketPlane to implement this.
https://github.com/docker/libnetwork
https://github.com/docker/libnetwork/blob/master/ROADMAP.md
48. Introducing Docker Glue
● docker-glue - modular pluggable daemon that can run handlers and scripts
● docker-balancer - a standalone daemon that just updates haproxy (a special case of glue)
https://github.com/muayyad-alsadi/docker-glue
autoconfigure haproxy to pass traffic to your containers
uses docker labels “-l” to specify http host or url prefix
# docker run -d --name wp1 -l glue_http_80_host='wp1.example.com' mywordpress/wordpress
# docker run -d --name wp2 -l glue_http_80_host='wp2.example.com' mywordpress/wordpress
# docker run -d --name panel -l glue_http_80_host=example.com-l glue_http_80_prefix=dashboard/ myrepo/control-
panel
49. Introducing Docker Glue
run any thing based on docker events (test.ini)
[handler]
class=DockerGlue.handlers.exec.ScriptHandler
events=all
enabled=1
triggers-none=0
[params]
script=test-handler.sh
demo-option=some value
# it will run
test-handler.sh /path/to/test.ini <EVENT> <CONTAINER_ID>