3. 1. Secure Computing: The basics
2. Libseccomp
3. Qemu sandboxing v1
4. Qemu sandboxing v2 and more options
Agenda
3
4. Secure Computing: the basics
● Kernel support first version dated from March, 8th 2005 (2.6.12)
Commit by: Andrea Arcangeli
● The main purpose is to call prctl() with PR_SET_SECCOMP on the
process which will allow only: exit(), sigreturn(), read()
and write()
○ Otherwise SIGKILL or SIGSYS are issued
4
5. Secure Computing: the basics
● Second kernel implementation with dynamic seccomp policies:
January, 11th 2011; Commit by: Will Drewry <wad@chromium.org>
● Now uses with seccomp() system call
● Uses BPF (Berkeley Packet Filter)
○ An in-kernel data link layer packet filter that has an abstracted API that
also works as a generic filter
5
7. Libseccomp
● Paul Moore (2011)
● Userspace layer to make life easier:
○ Abstract complex BPF constructions
○ Abstract differences between architectures and its ABIs
○ Optimize filter construction for best performance
○ Kill (sigkill), trap (sigsys), Allow in case of matched filter (among
other actions)
7
11. Qemu sandboxing v1
11
● Basic whitelist approach (--sandbox=on)
○ Every system call is blocked, except for the ones that are explicitly
whitelisted
● Various compatibility problems, requires lots of testing and
different workloads
● It’s safe right?
13. Qemu sandboxing v1
Not actually!
● QEMU links to too many different shared libraries and there is no way
to determine which code paths QEMU triggers in these libraries and
thus identify which syscalls will be genuinely needed.
● Sometimes you miss a syscall and it aborts right at the beginning
before boot (which is good?) but sometimes your VM is running for
days and it could suddenly abort (which is terrible)
13
14. Qemu sandboxing v2
● Extended blacklist approach (--sandbox=on,...)
● Everything is allowed except for a few sets that are definitely not
allowed
○ Default system calls: basic set of forbidden system calls (kexec,swapon,
swapoff, mount, umount, etc)
○ obsolete
○ elevateprivileges
○ spawn
○ resourcecontrol
14
15. Obsolete system calls
● Old system calls that were usefull in the past but became obsolete or
replaced by new version
○ Like readdir() being replaced by getdents()
● Should be by default blocked, but left an option to enabled it by
--sandbox on,obsolete=allow
15
16. Elevated Privileges
● This option would block all set*uid|gid system calls, this is known
to be required by some features like bridge helpers
● This option also does prctl(PR_SET_NO_NEW_PRIVS) which will
avoid new threads to escalate privilege as well
● This mode could be switched on or off by the option:
--sandbox on,elevatedprivileges=allow|deny|children
16
17. Spawn
● This option provides a fair way to disable new fork() or exec()
processes to be created at all, privileged or not.
● Things like bridge helper, SMB server, ifup/down scripts, migration
exec: protocol would all be disabled.
● This mode could be switched on or off by the option:
--sandbox on,spawn=allow|deny
17
18. Resource Control
● Avoids QEMU to set process affinity, scheduler priority, etc
● This shouldn’t be QEMU’s responsability to do this, but rather management
software like libvirt.
● This mode could be switched on or off by the option:
--sandbox on,resourcecontrol=allow|deny
18
20. Some thoughts on Qemu sandboxing
20
● Sandboxing is not your definitive solution for security on virtualization.
But rather a good solution to be stacked on others like:
○ MAC/DAC (Mandatory Access Control and Discretionary Access Control)
○ SELinux
○ Remote Management using SSH/TLS/SSL
○ Guest Image cryptography
○ Virtual Trusted Platform Module (vTPM)
● Sandbox v2 are not low level knobs to control system calls but rahter a high
level knobs to controls concepts.