This document provides an overview of CoreOS, an open source operating system designed for clusters and distributed systems. CoreOS is lightweight, uses Docker containers, automatically updates in a way that is quick and reliable, and has tools like etcd for service discovery and fleet for orchestrating containers across a cluster. The document includes code examples of setting up a CoreOS cluster on Vagrant and using fleet to launch and manage containers.
7. Lightweight
CoreOS is designed to be a modern, minimal base to build your
platform. Consumes 40% less RAM on boot than an average
Linux installation.
https://coreos.com/
7 / 80
8. Painless Updating
Utilizes an active/passive dual-partition scheme to update
the OS as a single unit instead of package by package. This
makes each update quick, reliable and able to be easily rolled
back.
https://coreos.com/
8 / 80
9. Docker Containers
Applications on CoreOS run as Docker containers. Containers
provide maximum flexibility in packaging and can start in
milliseconds.
https://coreos.com/
9 / 80
10. Clustered By Default
CoreOS works well on a single machine, but it's designed to
be clustered. Easily run application containers across
multiple machines with fleet and connect them together with
service discovery.
https://coreos.com/
10 / 80
11. Distributed Systems Tools
Built-in primitives such as distributed locking and master
election are the building blocks for large scale distributed
systems.
https://coreos.com/
11 / 80
12. Service Discovery
Easily locate where services are being run within the cluster
and be notified when something changes. Essential for a
complex, highly dynamic cluster. Built into CoreOS with high
availability and automatic fail-over.
https://coreos.com/
12 / 80
13. How is it different from other *NIXes?
13 / 80
14. No package manager
All your applications should run as a container
Linux kernel, docker, systemd, fleetd, etcd, sshd
According to https://coreos.com, it uses 114MB of RAM at
boot, approximately 40% less than average Linux server
Designed specifically for running distributed systems
14 / 80
17. What do
you have to
do
differently?
etcd service discovery
17 / 80
18. What do
you have to
do
differently?
etcd service discovery
broadcast your applications key
infrastructure settings back to etcd
18 / 80
19. What do
you have to
do
differently?
etcd service discovery
broadcast your applications key
infrastructure settings back to etcd
use fleet to orchestrate your containers
19 / 80
22. A highly-available key value store for
shared configuration and service
discovery. etcd is inspired by Apache
ZooKeeper and doozer
https://github.com/coreos/etcd#readme-version-046
22 / 80
23. Simple: curl'able user facing API (HTTP+JSON)
Secure: optional SSL client cert authentication
Fast: benchmarked 1000s of writes/s per instance
Reliable: properly distributed using Raft
etcd is written in Go and uses the Raft consensus algorithm
to manage a highly-available replicated log.
https://github.com/coreos/etcd#readme-version-046
23 / 80
25. In Search of an Understandable Concensus Algorithm by
Stanford's Diego Ongaro and John Ousterhout
https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf
"As a result, each state machine processes the same series
of commands and thus produces the same series of results
and arrives at the same series of states."
http://raftconsensus.github.io/
25 / 80
27. Raft elects a leader, and the leader records a master version
and distributes that to the other nodes in the cluster. It does
not write a confirmation until it hears back from a concensus
of nodes that agree.
If the leader goes AWOL for a certain time, then a new
election process begins to find a new leader and continue.
27 / 80
28. For now, just understand...
Raft is similar to Paxos in fault-tolerance and performance
and it makes sure that etcd and your cluster can continue
operating even if some nodes experience partitions (or are
terminated!)
28 / 80
29. This is an AWESOME animation you should watch because it
explains Raft MUCH better than I can:
http://thesecretlivesofdata.com/raft/
29 / 80
44. fleet (0.8) seems very early, rough, and
opinionated whereas etcd seems ready
for production
44 / 80
45. ...but it feels like the best option out
there right now
45 / 80
46. Read this post later:
http://lukebond.ghost.io/deploying-docker-containers-on-a-
coreos-cluster-with-fleet/
I found this while putting together this presentation, and I
think it does a great job explaining all this in written form
46 / 80
53. start up some units
core@core-01 ~/share/karlgrz-docker/fleet $ fleetctl start fantasy_web.service
jcsdoorsolutions_web.service stickfigureninjas_web.service karlgrz_web.service
Unit fantasy_web.service launched on adddf8be.../172.17.8.102
Unit karlgrz_web.service launched on adddf8be.../172.17.8.102
Unit jcsdoorsolutions_web.service launched on 78e5ab3e.../172.17.8.103
Unit stickfigureninjas_web.service launched on 78e5ab3e.../172.17.8.103
53 / 80
54. list loaded units and their status
core@core-01 ~/share/karlgrz-docker/fleet $ fleetctl list-units
UNIT MACHINE ACTIVE SUB
fantasy_web.service adddf8be.../172.17.8.102 activating start-pre
jcsdoorsolutions_web.service 78e5ab3e.../172.17.8.103 activating start-pre
karlgrz_web.service adddf8be.../172.17.8.102 active running
stickfigureninjas_web.service 78e5ab3e.../172.17.8.103 activating start-pre
core@core-01 ~/share/karlgrz-docker/fleet $ fleetctl list-units
UNIT MACHINE ACTIVE SUB
fantasy_web.service adddf8be.../172.17.8.102 active running
jcsdoorsolutions_web.service 78e5ab3e.../172.17.8.103 active running
karlgrz_web.service adddf8be.../172.17.8.102 active running
stickfigureninjas_web.service 78e5ab3e.../172.17.8.103 active running
54 / 80
56. run discovery sidekicks
core@core-01 ~/share/karlgrz-docker/fleet $ etcdctl ls /apps
core@core-01 ~/share/karlgrz-docker/fleet $ fleetctl start fantasy_discovery.service
jcsdoorsolutions_discovery.service stickfigureninjas_discovery.service
karlgrz_discovery.service
Unit jcsdoorsolutions_discovery.service launched on 78e5ab3e.../172.17.8.103
Unit stickfigureninjas_discovery.service launched on 78e5ab3e.../172.17.8.103
Unit fantasy_discovery.service launched on adddf8be.../172.17.8.102
Unit karlgrz_discovery.service launched on adddf8be.../172.17.8.102
core@core-01 ~/share/karlgrz-docker/fleet $ etcdctl ls /apps
/apps/rethinkdb_services
/apps/fantasy_web
/apps/karlgrz_web
/apps/jcsdoorsolutions_web
/apps/stickfigureninjas_web
56 / 80
58. list units
core@core-01 ~/share/karlgrz-docker/fleet $ fleetctl list-units
UNIT MACHINE ACTIVE SUB
fantasy_discovery.service adddf8be.../172.17.8.102 active running
fantasy_web.service adddf8be.../172.17.8.102 active running
jcsdoorsolutions_discovery.service 78e5ab3e.../172.17.8.103 active running
jcsdoorsolutions_web.service 78e5ab3e.../172.17.8.103 active running
karlgrz_discovery.service adddf8be.../172.17.8.102 active running
karlgrz_web.service adddf8be.../172.17.8.102 active running
rethinkdb_discovery.service df763c2f.../172.17.8.101 active running
rethinkdb_services.service df763c2f.../172.17.8.101 active running
stickfigureninjas_discovery.service78e5ab3e.../172.17.8.103 active running
stickfigureninjas_web.service 78e5ab3e.../172.17.8.103 active running
58 / 80
59. run a unit on ONLY one SPECIFIC node
[Unit]
Description=rethinkdb
After=docker.service
Requires=docker.service
[Service]
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker kill rethinkdb_services
ExecStartPre=-/usr/bin/docker rm rethinkdb_services
ExecStartPre=/usr/bin/docker pull dockerfile/rethinkdb
ExecStart=/usr/bin/docker run --name rethinkdb_services
-p 8080:8080 -p 28015:28015 -p 29015:29105 -v /home/core/rethinkdb:/data
-t dockerfile/rethinkdb rethinkdb -d /data --bind all
ExecStop=/usr/bin/docker stop rethinkdb_services
[X-Fleet]
MachineID=9f152bf8
59 / 80
60. see logging output from a running container
core@core-01 ~ $ fleetctl journal fantasy_web
-- Logs begin at Wed 2014-09-24 21:32:32 UTC, end at Thu 2014-09-25 19:55:26 UTC. --
Sep 25 18:22:08 core-02 docker[1572]: Python version: 2.7.6 (default, Mar 22 2014, 23:03:41) Sep 25 18:22:08 core-02 docker[1572]: Python main interpreter initialized at 0xc53540
Sep 25 18:22:08 core-02 docker[1572]: python threads support enabled
Sep 25 18:22:08 core-02 docker[1572]: your server socket listen backlog is limited to
100 connections
Sep 25 18:22:08 core-02 docker[1572]: your mercy for graceful operations on workers is
60 seconds
Sep 25 18:22:08 core-02 docker[1572]: mapped 72768 bytes (71 KB) for 1 cores
Sep 25 18:22:08 core-02 docker[1572]: *** Operational MODE: single process ***
Sep 25 18:22:09 core-02 docker[1572]: WSGI app 0 (mountpoint='') ready in 1 seconds on
interpreter 0xc53540 pid: 13 (default app)
Sep 25 18:22:09 core-02 docker[1572]: *** uWSGI is running in multiple interpreter mode ***
Sep 25 18:22:09 core-02 docker[1572]: spawned uWSGI worker 1 (and the only) (pid: 13, cores: 60 / 80
61. core@core-03 ~ $ fleetctl journal karlgrz_web
-- Logs begin at Wed 2014-09-24 21:32:32 UTC, end at Thu 2014-09-25 19:56:33 UTC. --
Sep 25 18:21:58 core-03 sh[1315]: ---> Using cache
Sep 25 18:21:58 core-03 sh[1315]: ---> ce8cd32fe157
Sep 25 18:21:58 core-03 sh[1315]: Step 6 : RUN cd /srv && make publish
Sep 25 18:21:58 core-03 sh[1315]: ---> Using cache
Sep 25 18:21:58 core-03 sh[1315]: ---> 83f7f333889b
Sep 25 18:21:58 core-03 sh[1315]: Step 7 : CMD ["nginx"]
Sep 25 18:21:58 core-03 sh[1315]: ---> Using cache
Sep 25 18:21:58 core-03 sh[1315]: ---> 4cf274f01dae
Sep 25 18:21:58 core-03 sh[1315]: Successfully built 4cf274f01dae
Sep 25 18:21:59 core-03 systemd[1]: Started karlgrz.com.
61 / 80
62. core@core-02 ~/share/karlgrz-docker/fleet $ fleetctl journal classholes_web
-- Logs begin at Wed 2014-09-24 21:32:01 UTC, end at Thu 2014-09-25 20:03:55 UTC. --
Sep 25 20:01:40 core-02 systemd[1]: Starting classholes.com...
Sep 25 20:01:40 core-02 docker[3071]: Error response from daemon: No such container:
classholes_web
Sep 25 20:01:40 core-02 docker[3071]: 2014/09/25 20:01:40 Error: failed to kill one or
more containers
Sep 25 20:01:40 core-02 docker[3085]: Error response from daemon: No such container:
classholes_web
Sep 25 20:01:40 core-02 docker[3085]: 2014/09/25 20:01:40 Error: failed to remove one or
more containers
Sep 25 20:01:40 core-02 docker[3095]: Pulling repository karlgrz/ubuntu-14.04-base-nginx
Sep 25 20:01:42 core-02 systemd[1]: classholes_web.service: control process exited, code=
exited status=1
Sep 25 20:01:42 core-02 systemd[1]: Failed to start classholes.com.
Sep 25 20:01:42 core-02 sh[3110]: /bin/sh: line 0: cd: /home/core/share/classholes: No such
file or directory
Sep 25 20:01:42 core-02 systemd[1]: Unit classholes_web.service entered failed state.
62 / 80
63. terminate a node and see the services running on it moved to
another node in the cluster
karl@karl-mediafly:~/workspace/coreos-vagrant$ vagrant ssh core-03 -- -A
Last login: Thu Sep 25 16:37:01 2014 from 10.0.2.2
CoreOS (beta)
core@core-03 ~ $ shutdown -n
shutdown: invalid option -- 'n'
core@core-03 ~ $ shutdown
Must be root.
core@core-03 ~ $ sudo shutdown -n
shutdown: invalid option -- 'n'
core@core-03 ~ $ sudo shutdown
Shutdown scheduled for Thu 2014-09-25 16:46:14 UTC, use 'shutdown -c' to cancel.
Broadcast message from root@core-03 (Thu 2014-09-25 16:45:14 UTC):
The system is going down for power-off at Thu 2014-09-25 16:46:14 UTC!
63 / 80
64. core@core-02 ~ $ fleetctl list-units
UNIT MACHINE ACTIVE SUB
fantasy_discovery.service adddf8be.../172.17.8.102 active running
fantasy_web.service adddf8be.../172.17.8.102 active running
karlgrz_discovery.service adddf8be.../172.17.8.102 active running
karlgrz_web.service adddf8be.../172.17.8.102 active running
rethinkdb_discovery.service df763c2f.../172.17.8.101 active running
rethinkdb_services.service df763c2f.../172.17.8.101 active running
64 / 80
65. core@core-02 ~ $ fleetctl list-units
UNIT MACHINE ACTIVE SUB
fantasy_discovery.service adddf8be.../172.17.8.102 active running
fantasy_web.service adddf8be.../172.17.8.102 active running
jcsdoorsolutions_discovery.service df763c2f.../172.17.8.101 active running
jcsdoorsolutions_web.service df763c2f.../172.17.8.101 activating start-pre
karlgrz_discovery.service adddf8be.../172.17.8.102 active running
karlgrz_web.service adddf8be.../172.17.8.102 active running
rethinkdb_discovery.service df763c2f.../172.17.8.101 active running
rethinkdb_services.service df763c2f.../172.17.8.101 active running
stickfigureninjas_discovery.servicedf763c2f.../172.17.8.101 active running
stickfigureninjas_web.service df763c2f.../172.17.8.101 activating start-pre
65 / 80
67. core@core-02 ~ $ fleetctl list-units
UNIT MACHINE ACTIVE SUB
fantasy_discovery.service adddf8be.../172.17.8.102 active running
fantasy_web.service adddf8be.../172.17.8.102 active running
jcsdoorsolutions_discovery.service df763c2f.../172.17.8.101 active running
jcsdoorsolutions_web.service df763c2f.../172.17.8.101 active running
karlgrz_discovery.service adddf8be.../172.17.8.102 active running
karlgrz_web.service adddf8be.../172.17.8.102 active running
rethinkdb_discovery.service df763c2f.../172.17.8.101 active running
rethinkdb_services.service df763c2f.../172.17.8.101 active running
stickfigureninjas_discovery.servicedf763c2f.../172.17.8.101 active running
stickfigureninjas_web.service df763c2f.../172.17.8.101 active running
67 / 80