Checkpoint/Restore is a technology that allows to take a snapshot of running Linux processes and restore those processes at any other place and time. This opens various possibilities such as live migration, keeping HPC tasks safe from hardware problems, cloud services, dynamic load balancing etc. Despite being very tempting feature to have, Linux lacked one for quite a long time.
The Checkpoint-Restore In Userspace (CRIU) project is The One to make this technology real. This talk covers the project history, its dependence from and influence on the Linux Kernel, and then goes on to usage scenarios that are now real with CRIU and that will be possible in the future.
The talk will be interesting to anyone who knows Linux as user, but especially to system and distribution developers, advanced users, and anyone involved in containers, virtualization, HA or HPC.
1. CRIU:
Time and Space Travel Service
for Linux Applications
Kir Kolyshkin
Texas Linux Fest, 14 Jun 2014
2. 2
Agenda
What is CRIU?
Project history and state
Usage scenarios
Live migration
Reboot-less kernel upgrade
Slow services startup
Advanced debugging and testing
and more...
3. 3
What is CRIU?
Checkpoint Restore In Userspace
Checkpoint
or
Dump
Restore
or
Restart
Full
info
about
state
4. 4
CRIU pre-history
●
OpenVZ project
●
Containers live migration feature
●
Containers → Upstream Linux
●
1500+ kernel patches from us
●
Kernel-level checkpoint-restore merge failed
●
User-level checkpoint-restore ...
6. 6
Some history
Project started almost 3 years ago
– an RFC on kernel memory API extension
– small command line tool
– minimal dump of process' internals
First release
– v0.1 -- 23 Jul 2012 (x86 and basic stuff)
Since then
– Kernel part completed a year ago (150+ kernel patches:
new APIs for reading and setting process' state)
7. 7
Current project state
The latest release
– v1.3rc1
– supports x86_64 & ARM & AARM64
– support features that typical apps use
– works on unmodified linux-3.11+
– Included into Debian, Fedora, Ubuntu, Arch, SUSE, Gentoo, CoreOS...
Explicitly checked
– Apache, nginx, Oracle*, mysql, mongodb
– ssh/sshd, openvpn, cron, sendmail
– Java, gcc, make
– VNC + { gimp, mplayer, blender, supertux }
– Screen + { bash, top, tcpdump, tar/bz2 }
* some kernel tweaks required
8. 8
Some vitals
- 55K lines of code
- 150+ kernel patches
- contribs from Google, Huawei, Samsung, Canonical
22. 22
More (funny) use cases
Forgot to launch your program in screen
– Live-migrate it there
Playing a game without the save button
– Snapshot it
[Put your own use case here]
http://criu.org/Usage_scenarios
23. 23
Recap
●
Started as containers live-migration tool
●
General tool to dump/restore apps state
●
v1.2 + Linux-3.11+ can do the trick
●
A lot of interesting technologies
●
Memory tracker
●
Migration of TCP connections
●
Injecting your code into a running application
●
Detecting kernel objects sharing
●
etc.
24. 24
Resources
http://criu.org – main site, documentation
http://git.criu.org – git repo with tool sources
http://plus.google.com/+CRIU page
criu@openvz.org mailing list
Kir Kolyshkin <kir@openvz.org> that's me
Thank you!