SlideShare une entreprise Scribd logo
1  sur  31
Andrey Vagin <avagin@openvz.org>
● 1 June 2013, Moscow<
Linux Containers
Fedora Virtualization Day
2
Different types of Virtualization
● Virtual Machines
– Emulation (qemu)
– Paravirtualization (XEN)
– Hardware Virtualization (KVM, ESX)
● OS Level Virtualization
– Containers (Linux Containers, Solaris Zones, BSD Jails)
3
Virtual Machine (VM)
Hardware
Hypervisor
Virtual HW
Kernel
Apps
Virtual HW
Kernel
Apps
Virtual HW
Kernel
Apps
Virtual HW
Kernel
Apps
4
Containers (CT)
Hardware
Host Kernel
Apps
Namespaces
Apps
Namespaces
Apps
Namespaces
Apps
Namespaces
- chroot() on steroids
5
7
Comparison VM-s vs CT-s
● One real HW, many virtual HW,
many OS-s.
● One real HW, one kernel, many
userspace instances
● Full control on the guest OS ● Native performance: [almost] no
overhead
● High density
● KSM (Kernel SamePage Merging) ● Use resources on demand
● Dynamic resource allocation
● Naturally share pages
● Depends on hardware
(VT-x, VT-d, EPT, etc)
● Not all functionality are virtualized
● Flexibility
8
9
10
Evolution of Operating System
● Multitask
many processes
● Multiuser
many users
● Multicontainer
many containers
11
Containers (CT)
Cgroups
– control resources
● cpu, cpuacct, cpuset
● blkio
● memory
● net_cls
Namespaces
– isolate environments
● MNT
● PID
● NET
● IPC
● User
● UTS
12
How to execute CT
All allowed by default
● unshare, nsenter
● Systemd Lightweight Containers
● LXC
● Libvirt LXC
All restricted by default
● OpenVZ (vzctl-core) (FC19)
13
vzctl - perform various operations on a container
# yum install -y vzctl-core
# vzctl create 101 --ostemplate fedora-15
# vzctl start 101
# vzctl exec 101 ps ax
PID TTY STAT TIME COMMAND
1 ? Ss 0:00 init
11830 ? Ss 0:00 syslogd -m 0
11897 ? Ss 0:00 /usr/sbin/sshd
11943 ? Ss 0:00 xinetd -stayalive -pidfile ...
12218 ? Ss 0:00 sendmail: accepting connections
12265 ? Ss 0:00 sendmail: Queue runner@01:00:00
13362 ? Ss 0:00 /usr/sbin/httpd
13363 ? S 0:00 _ /usr/sbin/httpd
..............................................
6416 ? Rs 0:00 ps axf
# vzctl stop 101
# vzctl destroy 101
14
OpenVZ kernel only features
● Ploop (snapshot, backups, different formats)
● Second level quota
● More functional memory accounting
● PFCache (memory deduplication. Io-ops saving)
● More isolated in compare with FC19 (lack of userns)
Questions?
http://openvz.org
Andrey Vagin <avagin@openvz.org><
CRIU - Checkpoint/Restore in User-space
17
What is C/R and how can it be used?
C/R is the ability to save states of processes and to restore them later.
Usage scenarios:
– Failure recovery
– Live migration
– Reboot-less upgrade
– Speed up of slow-boot services
– HPC issues
18
History
●
Berkeley Lab Checkpoint/Restart (BLCR) (2003)
– Load a kernel module and link with a library
● DMTCP: Distributed MultiThreaded CheckPointing (2004-2006)
– Preload a library
●
OpenVZ (2005)
– OpenVZ kernel
● Linux Checkpoint/Restart by Oren Laadan (2008)
– A non-mainline kernel
●
CRIU (2011)
OpenVZ
2005
BLCR
2003
Linux C/R
2008
CRIU
2011
DMTCP
2007
19
How does this work?
Kernel objects Process tree
crtools
Image files
Name-spaces
Files
Sockets
Pipes
001101
101010
110001
011010
000011
010101
001101
101010
110001
011010
000011
010101
001101
101010
110001
011010
000011
010101
001101
101010
110001
011010
000011
010101
001101
101010
110001
011010
000011
010101
001101
101010
110001
011010
000011
010101
20
Kernel interfaces
Dump Restore
syscalls
netlink
/proc/
ptrace
21
Dump
● Parasite code
– Receive file descriptors
– Dump memory content
– Prctl(), sigaction, pending signals, timers, etc.
● Ptrace
– freeze processes
– Inject a parasite code
● Netlink
– Get information about sockets, netns
● Procfs
/proc/PID/maps, /proc/PID/map_files/,
/proc/PID/status, /proc/PID/mountinfo
22
Restore
● Collect shared objects
● Restore name-spaces
● Create a process tree
– Restore SID, PGID
– Restore objects, which should be inherited
● Files, sockets, pipes, ...
● Restore per-task properties.
● Restore memory
● Call sigreturn
● Awesome
Namespaces
Processes
23
Interesting moments
● How to restore shared objects?
– Send file descriptors via unix sockets
– Map files from /proc/self/map_files/ for restoring anon shared mappings
● How to restore memory mappings on the correct places?
– Map a new code block and a stack
– Unmap crtools' mappings
– Remap task's mappings on the correct places
● How to resume a process?
– Create a signal frame
– Call sigreturn()
24
Kernel impact
~140 patches merged ~10 patches in flight
~11 new features appeared ~2 new features to come
25
New features in a kernel
● Parasite code injection (by Tejun Heo)
– Read task states, that are currently retrieved by a task only about itself
● The kcmp() system call
– Helps checking which kernel objects are shared between processes
● Proc map_files directory
– Find out what exact file is mapped
– Mappings sharing info
● A bunch of prctl extensions
– Set various private stuff on task/mm objects (c/r-only feature)
● Last-pid sysctl
– Restore task with desired PID value
26
New features in a kernel
● TCP repair mode
– Read intimate state of a TCP connection
and reconstructs it from scratch on a freshly created socket
● Sockets information dumping via netlink (sock_diag)
– Extendable sockets state retrieving engine
● Virtual net devices indexes
– Allows to restore network devices in a namespace
● Socket peeking offset
– Allows peeking sockets queues (reading without removing data from queue)
● Task memory tracking
– incremental snapshots, online migration
27
What are already supported?
– X86_64 architecture
– Process tree linkage
– Multi-threaded apps
– All kinds of memory mappings
– Terminals, groups, sessions
– Open files (shared and unlinked)
– Established TCP connections
– Unix sockets, Packet sockets
– Name-spaces (net, mount, ipc)
– Non-posix files (epoll, inotify)
– Pipes, Fifo-s, IPC, ...
– ARM architecture
– Pending signals
– TCP time-stamps
– Iterative snapshots
– VDSO
– LXC and OpenVZ containers
In flight
– Posix timers
– Convert OpenVZ images
28
How is CRIU tested?
● ZDTM – a set of unit-tests
● Real-life applications
– Apache, Nginx
– MySQL, MongoDB, Oracle
– Make && gcc
– Tar & gzip
– Screen
– Java
– LXC
– VNC server + GUI applications
29
Future plans (Feb, 2013)
● Support all kinds of kernel objects
● Merge all in-flight patches in the mainstream kernel
● Integrate CRIU with OpenVZ and LXC utilities
● Iterative migration
– Migrate memory content before freezing applications
● Integration in distributions
– CRIU was accepted to Fedora 19
30
How to use
● ./crtools dump -t pid [<options>]
– checkpoint a process/tree identified by pid
● ./crtools restore -t pid [<options>]
– restore - restore a process/tree identified by pid
● ./crtools show (-D dir)|(-f file) [<options>]
– show dump file(s) contents
● ./crtools check
– checks whether the kernel support is up-to-date
● ./crtools exec -t pid <syscall-string>
– exec - execute a system call by other task
31
Checkpoint/restore of a VNC server.
Questions?
http://criu.org

Contenu connexe

Tendances

Speeding up ps and top
Speeding up ps and topSpeeding up ps and top
Speeding up ps and top
Kirill Kolyshkin
 
Live migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel EmelyanovLive migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel Emelyanov
OpenVZ
 

Tendances (19)

Scalability and Performance of CNS 3.6
Scalability and Performance of CNS 3.6Scalability and Performance of CNS 3.6
Scalability and Performance of CNS 3.6
 
Gluster as Native Storage for Containers - past, present and future
Gluster as Native Storage for Containers - past, present and futureGluster as Native Storage for Containers - past, present and future
Gluster as Native Storage for Containers - past, present and future
 
Speeding up ps and top
Speeding up ps and topSpeeding up ps and top
Speeding up ps and top
 
Live migrating a container: pros, cons and gotchas
Live migrating a container: pros, cons and gotchasLive migrating a container: pros, cons and gotchas
Live migrating a container: pros, cons and gotchas
 
Live migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel EmelyanovLive migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel Emelyanov
 
Gluster volume snapshot
Gluster volume snapshotGluster volume snapshot
Gluster volume snapshot
 
Heketi Functionality into Glusterd2
Heketi Functionality into Glusterd2Heketi Functionality into Glusterd2
Heketi Functionality into Glusterd2
 
Container-relevant Upstream Kernel Developments
Container-relevant Upstream Kernel DevelopmentsContainer-relevant Upstream Kernel Developments
Container-relevant Upstream Kernel Developments
 
Gluster and Kubernetes
Gluster and KubernetesGluster and Kubernetes
Gluster and Kubernetes
 
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
 
XPDS14 - OSv - A Modern Semi-POSIX LibraryOS - Glauber Costa, Cloudius Systems
XPDS14 - OSv - A Modern Semi-POSIX LibraryOS - Glauber Costa, Cloudius SystemsXPDS14 - OSv - A Modern Semi-POSIX LibraryOS - Glauber Costa, Cloudius Systems
XPDS14 - OSv - A Modern Semi-POSIX LibraryOS - Glauber Costa, Cloudius Systems
 
Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016
 
High Availability Storage (susecon2016)
High Availability Storage (susecon2016)High Availability Storage (susecon2016)
High Availability Storage (susecon2016)
 
Small, Simple, and Secure: Alpine Linux under the Microscope
Small, Simple, and Secure: Alpine Linux under the MicroscopeSmall, Simple, and Secure: Alpine Linux under the Microscope
Small, Simple, and Secure: Alpine Linux under the Microscope
 
OSv at Usenix ATC 2014
OSv at Usenix ATC 2014OSv at Usenix ATC 2014
OSv at Usenix ATC 2014
 
Gluster as Block Store in Containers
Gluster as Block Store in ContainersGluster as Block Store in Containers
Gluster as Block Store in Containers
 
CoreOS Intro
CoreOS IntroCoreOS Intro
CoreOS Intro
 
Talk on PHP Day Uruguay about Docker
Talk on PHP Day Uruguay about DockerTalk on PHP Day Uruguay about Docker
Talk on PHP Day Uruguay about Docker
 
GlusterFS Cinder integration presented at GlusterNight Paris event @ Openstac...
GlusterFS Cinder integration presented at GlusterNight Paris event @ Openstac...GlusterFS Cinder integration presented at GlusterNight Paris event @ Openstac...
GlusterFS Cinder integration presented at GlusterNight Paris event @ Openstac...
 

Similaire à Fedora Virtualization Day: Linux Containers & CRIU

Docker Introduction + what is new in 0.9
Docker Introduction + what is new in 0.9 Docker Introduction + what is new in 0.9
Docker Introduction + what is new in 0.9
Jérôme Petazzoni
 
Ospresentation 120112074429-phpapp02 (1)
Ospresentation 120112074429-phpapp02 (1)Ospresentation 120112074429-phpapp02 (1)
Ospresentation 120112074429-phpapp02 (1)
Vivian Vhaves
 
Evolution of Linux Containerization
Evolution of Linux Containerization Evolution of Linux Containerization
Evolution of Linux Containerization
WSO2
 

Similaire à Fedora Virtualization Day: Linux Containers & CRIU (20)

OpenVZ Linux Containers
OpenVZ Linux ContainersOpenVZ Linux Containers
OpenVZ Linux Containers
 
Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)
 
Openvz booth
Openvz boothOpenvz booth
Openvz booth
 
LXC on Ganeti
LXC on GanetiLXC on Ganeti
LXC on Ganeti
 
CRIU: are we there yet?
CRIU: are we there yet?CRIU: are we there yet?
CRIU: are we there yet?
 
Lightweight Virtualization in Linux
Lightweight Virtualization in LinuxLightweight Virtualization in Linux
Lightweight Virtualization in Linux
 
Let's Containerize New York with Docker!
Let's Containerize New York with Docker!Let's Containerize New York with Docker!
Let's Containerize New York with Docker!
 
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQDocker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
 
Docker Introduction + what is new in 0.9
Docker Introduction + what is new in 0.9 Docker Introduction + what is new in 0.9
Docker Introduction + what is new in 0.9
 
Talk 160920 @ Cat System Workshop
Talk 160920 @ Cat System WorkshopTalk 160920 @ Cat System Workshop
Talk 160920 @ Cat System Workshop
 
Containers and Namespaces in the Linux Kernel
Containers and Namespaces in the Linux KernelContainers and Namespaces in the Linux Kernel
Containers and Namespaces in the Linux Kernel
 
Not so brief history of Linux Containers - Kir Kolyshkin
Not so brief history of Linux Containers - Kir KolyshkinNot so brief history of Linux Containers - Kir Kolyshkin
Not so brief history of Linux Containers - Kir Kolyshkin
 
Not so brief history of Linux Containers
Not so brief history of Linux ContainersNot so brief history of Linux Containers
Not so brief history of Linux Containers
 
Ospresentation 120112074429-phpapp02 (1)
Ospresentation 120112074429-phpapp02 (1)Ospresentation 120112074429-phpapp02 (1)
Ospresentation 120112074429-phpapp02 (1)
 
ubantu ppt.pptx
ubantu ppt.pptxubantu ppt.pptx
ubantu ppt.pptx
 
Linux Container Brief for IEEE WG P2302
Linux Container Brief for IEEE WG P2302Linux Container Brief for IEEE WG P2302
Linux Container Brief for IEEE WG P2302
 
Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...
 
Docker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xDocker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12x
 
Evolution of Linux Containerization
Evolution of Linux Containerization Evolution of Linux Containerization
Evolution of Linux Containerization
 
Evoluation of Linux Container Virtualization
Evoluation of Linux Container VirtualizationEvoluation of Linux Container Virtualization
Evoluation of Linux Container Virtualization
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Fedora Virtualization Day: Linux Containers & CRIU

  • 1. Andrey Vagin <avagin@openvz.org> ● 1 June 2013, Moscow< Linux Containers Fedora Virtualization Day
  • 2. 2 Different types of Virtualization ● Virtual Machines – Emulation (qemu) – Paravirtualization (XEN) – Hardware Virtualization (KVM, ESX) ● OS Level Virtualization – Containers (Linux Containers, Solaris Zones, BSD Jails)
  • 3. 3 Virtual Machine (VM) Hardware Hypervisor Virtual HW Kernel Apps Virtual HW Kernel Apps Virtual HW Kernel Apps Virtual HW Kernel Apps
  • 5. 5
  • 6. 7 Comparison VM-s vs CT-s ● One real HW, many virtual HW, many OS-s. ● One real HW, one kernel, many userspace instances ● Full control on the guest OS ● Native performance: [almost] no overhead ● High density ● KSM (Kernel SamePage Merging) ● Use resources on demand ● Dynamic resource allocation ● Naturally share pages ● Depends on hardware (VT-x, VT-d, EPT, etc) ● Not all functionality are virtualized ● Flexibility
  • 7. 8
  • 8. 9
  • 9. 10 Evolution of Operating System ● Multitask many processes ● Multiuser many users ● Multicontainer many containers
  • 10. 11 Containers (CT) Cgroups – control resources ● cpu, cpuacct, cpuset ● blkio ● memory ● net_cls Namespaces – isolate environments ● MNT ● PID ● NET ● IPC ● User ● UTS
  • 11. 12 How to execute CT All allowed by default ● unshare, nsenter ● Systemd Lightweight Containers ● LXC ● Libvirt LXC All restricted by default ● OpenVZ (vzctl-core) (FC19)
  • 12. 13 vzctl - perform various operations on a container # yum install -y vzctl-core # vzctl create 101 --ostemplate fedora-15 # vzctl start 101 # vzctl exec 101 ps ax PID TTY STAT TIME COMMAND 1 ? Ss 0:00 init 11830 ? Ss 0:00 syslogd -m 0 11897 ? Ss 0:00 /usr/sbin/sshd 11943 ? Ss 0:00 xinetd -stayalive -pidfile ... 12218 ? Ss 0:00 sendmail: accepting connections 12265 ? Ss 0:00 sendmail: Queue runner@01:00:00 13362 ? Ss 0:00 /usr/sbin/httpd 13363 ? S 0:00 _ /usr/sbin/httpd .............................................. 6416 ? Rs 0:00 ps axf # vzctl stop 101 # vzctl destroy 101
  • 13. 14 OpenVZ kernel only features ● Ploop (snapshot, backups, different formats) ● Second level quota ● More functional memory accounting ● PFCache (memory deduplication. Io-ops saving) ● More isolated in compare with FC19 (lack of userns)
  • 15. Andrey Vagin <avagin@openvz.org>< CRIU - Checkpoint/Restore in User-space
  • 16. 17 What is C/R and how can it be used? C/R is the ability to save states of processes and to restore them later. Usage scenarios: – Failure recovery – Live migration – Reboot-less upgrade – Speed up of slow-boot services – HPC issues
  • 17. 18 History ● Berkeley Lab Checkpoint/Restart (BLCR) (2003) – Load a kernel module and link with a library ● DMTCP: Distributed MultiThreaded CheckPointing (2004-2006) – Preload a library ● OpenVZ (2005) – OpenVZ kernel ● Linux Checkpoint/Restart by Oren Laadan (2008) – A non-mainline kernel ● CRIU (2011) OpenVZ 2005 BLCR 2003 Linux C/R 2008 CRIU 2011 DMTCP 2007
  • 18. 19 How does this work? Kernel objects Process tree crtools Image files Name-spaces Files Sockets Pipes 001101 101010 110001 011010 000011 010101 001101 101010 110001 011010 000011 010101 001101 101010 110001 011010 000011 010101 001101 101010 110001 011010 000011 010101 001101 101010 110001 011010 000011 010101 001101 101010 110001 011010 000011 010101
  • 20. 21 Dump ● Parasite code – Receive file descriptors – Dump memory content – Prctl(), sigaction, pending signals, timers, etc. ● Ptrace – freeze processes – Inject a parasite code ● Netlink – Get information about sockets, netns ● Procfs /proc/PID/maps, /proc/PID/map_files/, /proc/PID/status, /proc/PID/mountinfo
  • 21. 22 Restore ● Collect shared objects ● Restore name-spaces ● Create a process tree – Restore SID, PGID – Restore objects, which should be inherited ● Files, sockets, pipes, ... ● Restore per-task properties. ● Restore memory ● Call sigreturn ● Awesome Namespaces Processes
  • 22. 23 Interesting moments ● How to restore shared objects? – Send file descriptors via unix sockets – Map files from /proc/self/map_files/ for restoring anon shared mappings ● How to restore memory mappings on the correct places? – Map a new code block and a stack – Unmap crtools' mappings – Remap task's mappings on the correct places ● How to resume a process? – Create a signal frame – Call sigreturn()
  • 23. 24 Kernel impact ~140 patches merged ~10 patches in flight ~11 new features appeared ~2 new features to come
  • 24. 25 New features in a kernel ● Parasite code injection (by Tejun Heo) – Read task states, that are currently retrieved by a task only about itself ● The kcmp() system call – Helps checking which kernel objects are shared between processes ● Proc map_files directory – Find out what exact file is mapped – Mappings sharing info ● A bunch of prctl extensions – Set various private stuff on task/mm objects (c/r-only feature) ● Last-pid sysctl – Restore task with desired PID value
  • 25. 26 New features in a kernel ● TCP repair mode – Read intimate state of a TCP connection and reconstructs it from scratch on a freshly created socket ● Sockets information dumping via netlink (sock_diag) – Extendable sockets state retrieving engine ● Virtual net devices indexes – Allows to restore network devices in a namespace ● Socket peeking offset – Allows peeking sockets queues (reading without removing data from queue) ● Task memory tracking – incremental snapshots, online migration
  • 26. 27 What are already supported? – X86_64 architecture – Process tree linkage – Multi-threaded apps – All kinds of memory mappings – Terminals, groups, sessions – Open files (shared and unlinked) – Established TCP connections – Unix sockets, Packet sockets – Name-spaces (net, mount, ipc) – Non-posix files (epoll, inotify) – Pipes, Fifo-s, IPC, ... – ARM architecture – Pending signals – TCP time-stamps – Iterative snapshots – VDSO – LXC and OpenVZ containers In flight – Posix timers – Convert OpenVZ images
  • 27. 28 How is CRIU tested? ● ZDTM – a set of unit-tests ● Real-life applications – Apache, Nginx – MySQL, MongoDB, Oracle – Make && gcc – Tar & gzip – Screen – Java – LXC – VNC server + GUI applications
  • 28. 29 Future plans (Feb, 2013) ● Support all kinds of kernel objects ● Merge all in-flight patches in the mainstream kernel ● Integrate CRIU with OpenVZ and LXC utilities ● Iterative migration – Migrate memory content before freezing applications ● Integration in distributions – CRIU was accepted to Fedora 19
  • 29. 30 How to use ● ./crtools dump -t pid [<options>] – checkpoint a process/tree identified by pid ● ./crtools restore -t pid [<options>] – restore - restore a process/tree identified by pid ● ./crtools show (-D dir)|(-f file) [<options>] – show dump file(s) contents ● ./crtools check – checks whether the kernel support is up-to-date ● ./crtools exec -t pid <syscall-string> – exec - execute a system call by other task

Notes de l'éditeur

  1. BLCR is used a kernel module, doesn&apos;t checkpoint sockets, SysV IPC, zombies, etc. Applications should be linked with a library and executed via a helper. DMTCP uses an executer too, but doesn&apos;t require a kernel module. C/R in OpenVZ is used for checkpount/restore and migrate OpenVZ containers. It requires the OpenVZ kernel. Linux C/R is very similar on OpenVZ C/R. It is used for checkpoint/restore of LXC. CRIU combines all this project. It will work on the pure upstream kernel. It&apos;s able to dump a task without any preparation.