SlideShare une entreprise Scribd logo
1  sur  42
Télécharger pour lire hors ligne
Unikernels: the Rise of
the Library Hypervisor
Anil Madhavapeddy, @avsm
Mindy Preston, @yomimono
Martin Lucina
+the MirageOS and Docker for Mac/Win teams
Docker Inc, @docker
with contributions from IBM
Docker Distributed Systems Summit
7th October 2016, Berlin, Germany
Conventional hypervisors
• Run full guest operating
systems with complex
emulation needs.
• Scaffolding for device
emulation, instruction
emulation, etc.
• Hard to compose into existing
infrastructure without wrapping
a full hypervisor layer.
Xen Hypervisor
qemu
xenstored
xenconsoled
Hardware
Dom0DomU
Conventional hypervisors
CVE-2016-3710: VGA emulation
missing bounds checks causes exploit.
CVE-2016-5403: unbounded virtio
memory usage causes DoS.
CVE-2016-3672: unrestricted qemu
logging causes DoS.
CVE-2015-8554: qemu-dm buffer
overrun in MSI-X causes exploit.
CVE-2015-7504: heap overflow in
pcnet emulator causes exploit.
• Run full guest operating
systems with complex
emulation needs.
• Scaffolding for device
emulation, instruction
emulation, etc.
• Hard to compose into existing
infrastructure without wrapping
a full hypervisor layer.
How can distributed systems
use hardware protection more
flexibly and composably?
Recap: Unikernels
• "library operating systems"
break kernels into libraries.
• Link libraries with a boot layer,
scheduler and application.
• Portable microservices that boot
directly on hypervisors or Unix. Xen
Hardware
App
Linux
Hardware
DockerApp
Configuration Business Logic
HTTP JSON SSL
TCP/IP
Xen
Devices
Unix
libev
Unix
musl libc
Application
Libraries
Libraries
Recap: Unikernels
• Many benefits are lost when
deploying on existing clouds.
• Tiny binaries (200k) still require
scaffolding of a full OS to boot.
• Difficult to manage hypervisor
from inside a container as full
host privilege is needed.
• "library operating systems"
break kernels into libraries.
• Link libraries with a boot layer,
scheduler and application.
• Portable microservices that boot
directly on hypervisors or Unix.
Library Hypervisors
• Extend the "kit" model and break down hypervisor
functionality into libraries.
• Expose core functionality (CPU and memory) as library,
and other pieces (device emulation) are optional.
• Benefit: huge reduction in TCB, and better fit to
container-native infrastructure with privilege dropping.
• Drawback: no existing support in operating systems.
Library Hypervisors
• Extend the "kit" model and break down hypervisor
functionality into libraries.
• Expose core functionality (CPU and memory) as library,
and other pieces (device emulation) are optional.
• Benefit: huge reduction in TCB, and better fit to
container-native infrastructure with privilege dropping.
• Drawback: no existing support in operating systems.
But let's a closer look!
What has changed?
OSX
Hypervisor
framework
FreeBSD
bHyve
xHyveHyperKit
bhyve.org
xhyve.org
github.com/docker/hyperkit
What has changed?
OSX
Hypervisor
framework
Linux
/dev/kvm
FreeBSD
bHyve
xHyveHyperKit
kvmtool
novm
ukvm
What has changed?
OSX
Hypervisor
framework
Linux
/dev/kvm
FreeBSD
bHyve
xHyveHyperKit
kvmtool
novm
Docker for
Mac
MirageOS3
ukvm
• Easy drag and drop installation, and
autoupdates to get latest Docker.
• Secure, sandboxed virtualisation
architecture without elevated privileges.
• Native networking support, with VPN and
network sharing compatibility.
• File sharing between container and host:
uid mapping, inotify events, etc.
Docker for Mac
Aiming for a native OSX experience
that works with existing developer
workflows.
• Uses the new HyperKit framework, which is in turn
based on xHyve and FreeBSD's bHyve.
• Sandbox friendly: processes largely run as non-
root, with privileges of the local user.
Virtualisation
• Uses the new HyperKit framework, which is in turn
based on xHyve and FreeBSD's bHyve.
• Sandbox friendly: processes largely run as non-
root, with privileges of the local user.
Virtualisation
OSX Kernel
Hypervisor.
framework
Hardware
virt: VMX,
nested
paging
• Uses the new HyperKit framework, which is in turn
based on xHyve and FreeBSD's bHyve.
• Sandbox friendly: processes largely run as non-
root, with privileges of the local user.
Virtualisation
OSX Kernel Userspace
Hypervisor.
framework
User Process
Thread/vCPU
Traps on I/O pages
Manages ACPI, PCI
devices
Hardware
virt: VMX,
nested
paging
• Uses the new HyperKit framework, which is in turn
based on xHyve and FreeBSD's bHyve.
• Sandbox friendly: processes largely run as non-
root, with privileges of the local user.
Virtualisation
OSX Kernel Userspace
Hypervisor.
framework
User ProcessHardware
virt: VMX,
nested
paging
Process
Linux Kernel
VirtIO IPC
VirtIO Block
VirtIO Net
Alpine Linux
Userspace
Latest Docker
preconfigured
QCow2
VPNKit
Logs redirected to
OSX host
• Uses the new HyperKit framework, which is in turn
based on xHyve and FreeBSD's bHyve.
• Embeds Linux: includes an embedded
lightweight Alpine Linux distribution optimised for
fast boot and stateless operation for containers.
Virtualisation
$ docker info
Containers: 358
Running: 13
Paused: 0
Stopped: 345
Images: 485
Server Version: 1.11.1
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge null host
Kernel Version: 4.4.9-moby
Operating System: Alpine Linux v3.3
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.858 GiB
HyperKit library structure
• In HyperKit, most functionality is linked as a library.
• If app doesn't need a protocol, it is not linked and
not part of the trusted computing base.
• Want to hide the gory details of virtualisation from
the user. The Linux VM should be "invisible".
• Not solving this leads to many user complaints:
• VPN software and corporate installations do not
like bridged virtual machines or custom routing.

Result: container traffic cannot connect to Internet.
• Services cannot be exposed on localhost or
the external interface and are instead on the Linux
VM IP address.

Result: breaks common web oAuth workflows.
Networking
Networking
OSX Kernel Userspace
Hypervisor.
framework
HyperKitHardware
virt: VMX,
nested
paging
VirtIO IPC
VirtIO Block
VirtIO Net
Networking
OSX Kernel Userspace
Hypervisor.
framework
HyperKitHardware
virt: VMX,
nested
paging
VirtIO IPC
VirtIO Block
VirtIO Net
Ethernet In
Containers!
Containers!
Containers!
Networking
OSX Kernel Userspace
Hypervisor.
framework
HyperKitHardware
virt: VMX,
nested
paging
VirtIO IPC
VirtIO Block
VirtIO Net
Ethernet In
Bridge
Ethernet
Kernel
Module
Containers!
Containers!
Containers!
• Want to hide the gory details of virtualisation from
the user. The Linux VM should be "invisible".
• Not solving this leads to many user complaints:
• VPN software and corporate installations do not
like bridged virtual machines or custom routing.

Result: container traffic cannot connect to Internet.
• Services cannot be exposed on localhost or
the external interface and are instead on the Linux
VM IP address.

Result: breaks common web oAuth workflows.
Networking
Networking
OSX Kernel Userspace
Hypervisor.
framework
HyperKitHardware
virt: VMX,
nested
paging
VirtIO IPC
VirtIO Block
VirtIO Net
Ethernet In
Bridge
Ethernet
Kernel
Module
Containers!
Containers!
Containers!
Networking
OSX Kernel Userspace
Hypervisor.
framework
HyperKitHardware
virt: VMX,
nested
paging
VirtIO IPC
VirtIO Block
VirtIO Net
Ethernet In
VPNKit
MirageOS
TCP/IP
DNS
Socketer
Kernel
Sockets
Containers!
Containers!
Containers!
github.com/docker/vpnkit
• Challenge: Deal with custom VPN software on the
host that makes it difficult to bridge.
• Solution: VPNKit, efficiently reconstructs container
traffic into separate TCP/IP flows and translates
them into native OSX/Windows sockets.
• Benefits:
• All network traffic is generated from normal socket
calls (e.g. gethostbyaddr) on the Mac, so
interacts well with firewalls, VPNs, and any local
security policies.
Networking
• Challenge: Services publishing ports should be
exposed on localhost without needing VM info.
• Solution: VPNKit forwards container port requests
to a OSX service which binds them natively on its
external interface.
• Benefits:
• docker run -P on the Mac now works without
requiring any knowledge of the VM innards.
• External oAuth workflows operate with web apps.
Networking
• Native OSX application, uses HyperKit to virtualise
for domain-specific purpose ("docker run")
• Links MirageOS unikernel libraries for networking
and storage translation between OS boundaries.
• The library approach let us glue together these
components really easily.
• Docker for Mac is quite a complex distributed
system internally, but (hopefully) hidden from user.
Docker for Mac + unikernels
MirageOS 3 + Solo5
•Unikernels have been gathering pace; next
challenge is to make them easily deployable.
•Build handled via Docker, but docker run
shouldn't need privileges (e.g. to start a VM).
•MirageOS 3 has a new library hypervisor for
Linux, developed by IBM, Docker and
Cambridge University contributors.
mirage.io
MirageOS 3 + Solo5
• Source: https://github.com/Solo5/solo5
• Runs as a Unix process and opens /dev/kvm for
hardware isolation.
• ukvm is a small, modular monitor that links only what is
needed. Can be 10k in size!
• Can run privilege separated: one process opens /dev/
kvm and drops privileges and executes the unikernel.
• Boot times are the same as process fork times, since all
the device setup is handled in-process.
MirageOS 3 + Solo5
Source: Dan Williams and Ricardo Koller, IBM Research, HotCloud 16
MirageOS 3 + Solo5
• Due for stable release in the next month.
• Intended to be "unikernel template" for
other projects to share hypervisor code.
• Liberally licensed under BSD/Apache2/ISC
to encourage adoption and embedding.
• BoF and tutorials tomorrow to demonstrate
it. Developers are all here and hacking!
Demo!
How can distributed systems
use hardware protection more
flexibly and composably?
Questions?
Download free at
docker.com
Twitter: @avsm
https://github.com/docker/hyperkit
https://github.com/docker/vpnkit
https://github.com/docker/datakit
https://github.com/mirage/
We will be
hacking
tomorrow!
Backup Slides
• Challenge: Share arbitrary OSX directory tree into
Linux container without requiring extensive
modification of either side.
• Solution: Use a FUSE forwarding layer and
translate Linux filesystem calls to OSX equivalents.
OSX Host Linux Host Container
VOLUMEcom.docker.osxfs
Track extra
metadata
Translate to OSX
filesystem calls
FUSE
Filesystem Sharing
• Challenge: Need filesystem activation so events on
the Mac wake up container servers and vice-versa.
• Solution: osxfs uses FSEvents API and injects
inotify activation events into container.
OSX Host Linux Host Container
VOLUMEcom.docker.osxfs
FSEvents watches
open files
Events from Linux
causes OSX apps
to wake up
FUSE
Filesystem Sharing
• Challenge: Need filesystem activation so events on
the Mac wake up container servers and vice-versa.
• Solution: osxfs uses FSEvents API and injects
inotify activation events into container.
OSX Host Linux Host Container
VOLUMEcom.docker.osxfs
FSEvents watches
open files
Events from Linux
causes OSX apps
to wake up
FUSE
Filesystem Sharing
• Challenge: Deal with custom VPN software on the
host that makes it difficult to bridge.
• Solution: VPNKit, efficiently reconstructs container
traffic into separate TCP/IP flows and translates
them into native OSX/Windows sockets.
OSX Host Linux Host Container
RUN <...>com.docker.hyperkit-net
Reconstruct traffic
TCP flows
Translate to OSX
socket calls
Ethernet bridge
DHCPv4
NTP
Networking
OSX Host Linux Host
Privileged Port
Service
Container
EXPOSE
Port Service
VSock Binder
RUN <...>
VSock Listener
Userland Proxy
• Challenge: Services publishing ports should be
exposed on localhost without needing VM info.
• Solution: VPNKit forwards container port requests
to a OSX service which binds them natively on its
external interface.
Networking
$ docker run resin/armv7hf-debian uname -a
Linux 7ed2fca7a3f0 4.1.12 #1 SMP Tue Jan 12 10:51:00
UTC 2016 armv7l GNU/Linux
$ docker run justincormack/ppc64le-debian uname -a
Linux edd13885f316 4.1.12 #1 SMP Tue Jan 12 10:51:00
UTC 2016 ppc64le GNU/Linux
Multi-CPU architectures

Contenu connexe

Tendances

Open Vulnerability Assesment System (OpenVAS)
Open Vulnerability Assesment System (OpenVAS)Open Vulnerability Assesment System (OpenVAS)
Open Vulnerability Assesment System (OpenVAS)
Information Technology Inistitute
 
virtualization and hypervisors
virtualization and hypervisorsvirtualization and hypervisors
virtualization and hypervisors
Gaurav Suri
 
Virtualization - Kernel Virtual Machine (KVM)
Virtualization - Kernel Virtual Machine (KVM)Virtualization - Kernel Virtual Machine (KVM)
Virtualization - Kernel Virtual Machine (KVM)
Wan Leung Wong
 

Tendances (20)

Hypervisor
HypervisorHypervisor
Hypervisor
 
Open Vulnerability Assesment System (OpenVAS)
Open Vulnerability Assesment System (OpenVAS)Open Vulnerability Assesment System (OpenVAS)
Open Vulnerability Assesment System (OpenVAS)
 
Server virtualization
Server virtualizationServer virtualization
Server virtualization
 
Virtualization
VirtualizationVirtualization
Virtualization
 
Virtual Machine
Virtual MachineVirtual Machine
Virtual Machine
 
virtualization and hypervisors
virtualization and hypervisorsvirtualization and hypervisors
virtualization and hypervisors
 
Virtual machine
Virtual machineVirtual machine
Virtual machine
 
Virtualization.ppt
Virtualization.pptVirtualization.ppt
Virtualization.ppt
 
Virtualization basics
Virtualization basics Virtualization basics
Virtualization basics
 
Virtual System
Virtual SystemVirtual System
Virtual System
 
Microsoft Hyper-V
Microsoft Hyper-VMicrosoft Hyper-V
Microsoft Hyper-V
 
What is Virtualization and its types & Techniques.What is hypervisor and its ...
What is Virtualization and its types & Techniques.What is hypervisor and its ...What is Virtualization and its types & Techniques.What is hypervisor and its ...
What is Virtualization and its types & Techniques.What is hypervisor and its ...
 
Virtualization
VirtualizationVirtualization
Virtualization
 
Virtualization - Kernel Virtual Machine (KVM)
Virtualization - Kernel Virtual Machine (KVM)Virtualization - Kernel Virtual Machine (KVM)
Virtualization - Kernel Virtual Machine (KVM)
 
risk based testing and regression testing
risk based testing and regression testingrisk based testing and regression testing
risk based testing and regression testing
 
Hypervisors
HypervisorsHypervisors
Hypervisors
 
Virtualization & cloud computing
Virtualization & cloud computingVirtualization & cloud computing
Virtualization & cloud computing
 
VMware Vsphere Graduation Project Presentation
VMware Vsphere Graduation Project PresentationVMware Vsphere Graduation Project Presentation
VMware Vsphere Graduation Project Presentation
 
Linux LVM Logical Volume Management
Linux LVM Logical Volume ManagementLinux LVM Logical Volume Management
Linux LVM Logical Volume Management
 
Linux basics
Linux basicsLinux basics
Linux basics
 

En vedette

En vedette (20)

containerd and CRI
containerd and CRIcontainerd and CRI
containerd and CRI
 
Docker Online Meetup: Announcing Docker CE + EE
Docker Online Meetup: Announcing Docker CE + EEDocker Online Meetup: Announcing Docker CE + EE
Docker Online Meetup: Announcing Docker CE + EE
 
Persistent storage tailored for containers
Persistent storage tailored for containersPersistent storage tailored for containers
Persistent storage tailored for containers
 
Driving containerd operations with gRPC
Driving containerd operations with gRPCDriving containerd operations with gRPC
Driving containerd operations with gRPC
 
Docker Networking: Control plane and Data plane
Docker Networking: Control plane and Data planeDocker Networking: Control plane and Data plane
Docker Networking: Control plane and Data plane
 
Containerd - core container runtime component
Containerd - core container runtime component Containerd - core container runtime component
Containerd - core container runtime component
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy
 
Orchestrating Least Privilege by Diogo Monica
Orchestrating Least Privilege by Diogo Monica Orchestrating Least Privilege by Diogo Monica
Orchestrating Least Privilege by Diogo Monica
 
Online Meetup: What's new in docker 1.13.0
Online Meetup: What's new in docker 1.13.0 Online Meetup: What's new in docker 1.13.0
Online Meetup: What's new in docker 1.13.0
 
Using Docker Swarm Mode to Deploy Service Without Loss by Dongluo Chen & Nish...
Using Docker Swarm Mode to Deploy Service Without Loss by Dongluo Chen & Nish...Using Docker Swarm Mode to Deploy Service Without Loss by Dongluo Chen & Nish...
Using Docker Swarm Mode to Deploy Service Without Loss by Dongluo Chen & Nish...
 
containerd summit - Deep Dive into containerd
containerd summit - Deep Dive into containerdcontainerd summit - Deep Dive into containerd
containerd summit - Deep Dive into containerd
 
Docker and Microsoft - Windows Server 2016 Technical Deep Dive
Docker and Microsoft - Windows Server 2016 Technical Deep DiveDocker and Microsoft - Windows Server 2016 Technical Deep Dive
Docker and Microsoft - Windows Server 2016 Technical Deep Dive
 
Talking TUF: Securing Software Distribution
Talking TUF: Securing Software DistributionTalking TUF: Securing Software Distribution
Talking TUF: Securing Software Distribution
 
Cilium - BPF & XDP for containers
 Cilium - BPF & XDP for containers Cilium - BPF & XDP for containers
Cilium - BPF & XDP for containers
 
Infinit: Modern Storage Platform for Container Environments
Infinit: Modern Storage Platform for Container EnvironmentsInfinit: Modern Storage Platform for Container Environments
Infinit: Modern Storage Platform for Container Environments
 
Docker Online Meetup: Infrakit update and Q&A
Docker Online Meetup: Infrakit update and Q&ADocker Online Meetup: Infrakit update and Q&A
Docker Online Meetup: Infrakit update and Q&A
 
Docker Roadshow 2016
Docker Roadshow 2016Docker Roadshow 2016
Docker Roadshow 2016
 
Docker 101 - Nov 2016
Docker 101 - Nov 2016Docker 101 - Nov 2016
Docker 101 - Nov 2016
 
'The History of Metrics According to me' by Stephen Day
'The History of Metrics According to me' by Stephen Day'The History of Metrics According to me' by Stephen Day
'The History of Metrics According to me' by Stephen Day
 
Docker introduction
Docker introductionDocker introduction
Docker introduction
 

Similaire à Unikernels: the rise of the library hypervisor in MirageOS

2 Linux Container and Docker
2 Linux Container and Docker2 Linux Container and Docker
2 Linux Container and Docker
Fabio Fumarola
 
Rmll Virtualization As Is Tool 20090707 V1.0
Rmll Virtualization As Is Tool 20090707 V1.0Rmll Virtualization As Is Tool 20090707 V1.0
Rmll Virtualization As Is Tool 20090707 V1.0
guest72e8c1
 
Docker - Portable Deployment
Docker - Portable DeploymentDocker - Portable Deployment
Docker - Portable Deployment
javaonfly
 

Similaire à Unikernels: the rise of the library hypervisor in MirageOS (20)

Advanced Docker Developer Workflows on MacOS X and Windows
Advanced Docker Developer Workflows on MacOS X and WindowsAdvanced Docker Developer Workflows on MacOS X and Windows
Advanced Docker Developer Workflows on MacOS X and Windows
 
OSCON: Advanced Docker developer workflows on Mac OS and Windows
OSCON: Advanced Docker developer workflows on Mac OS and WindowsOSCON: Advanced Docker developer workflows on Mac OS and Windows
OSCON: Advanced Docker developer workflows on Mac OS and Windows
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and docker
 
2 Linux Container and Docker
2 Linux Container and Docker2 Linux Container and Docker
2 Linux Container and Docker
 
Develop with linux containers and docker
Develop with linux containers and dockerDevelop with linux containers and docker
Develop with linux containers and docker
 
RMLL / LSM 2009
RMLL / LSM 2009RMLL / LSM 2009
RMLL / LSM 2009
 
Rmll Virtualization As Is Tool 20090707 V1.0
Rmll Virtualization As Is Tool 20090707 V1.0Rmll Virtualization As Is Tool 20090707 V1.0
Rmll Virtualization As Is Tool 20090707 V1.0
 
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
 
Docker - Portable Deployment
Docker - Portable DeploymentDocker - Portable Deployment
Docker - Portable Deployment
 
Introduction to Virtualization
Introduction to VirtualizationIntroduction to Virtualization
Introduction to Virtualization
 
The Lies We Tell Our Code (#seascale 2015 04-22)
The Lies We Tell Our Code (#seascale 2015 04-22)The Lies We Tell Our Code (#seascale 2015 04-22)
The Lies We Tell Our Code (#seascale 2015 04-22)
 
The lies we tell our code, LinuxCon/CloudOpen 2015-08-18
The lies we tell our code, LinuxCon/CloudOpen 2015-08-18The lies we tell our code, LinuxCon/CloudOpen 2015-08-18
The lies we tell our code, LinuxCon/CloudOpen 2015-08-18
 
Docker introduction
Docker introductionDocker introduction
Docker introduction
 
LOAD BALANCING OF APPLICATIONS USING XEN HYPERVISOR
LOAD BALANCING OF APPLICATIONS  USING XEN HYPERVISORLOAD BALANCING OF APPLICATIONS  USING XEN HYPERVISOR
LOAD BALANCING OF APPLICATIONS USING XEN HYPERVISOR
 
Linux container & docker
Linux container & dockerLinux container & docker
Linux container & docker
 
Docker Meetup 08 03-2016
Docker Meetup 08 03-2016Docker Meetup 08 03-2016
Docker Meetup 08 03-2016
 
Docker - Demo on PHP Application deployment
Docker - Demo on PHP Application deployment Docker - Demo on PHP Application deployment
Docker - Demo on PHP Application deployment
 
WSO2ConEU 2016 Tutorial - Deploying WSO2 Middleware on Containers
WSO2ConEU 2016 Tutorial - Deploying WSO2 Middleware on ContainersWSO2ConEU 2016 Tutorial - Deploying WSO2 Middleware on Containers
WSO2ConEU 2016 Tutorial - Deploying WSO2 Middleware on Containers
 
Deploying WSO2 Middleware on Containers
Deploying WSO2 Middleware on ContainersDeploying WSO2 Middleware on Containers
Deploying WSO2 Middleware on Containers
 
Cont0519
Cont0519Cont0519
Cont0519
 

Plus de Docker, Inc.

Build & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWSBuild & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWS
Docker, Inc.
 
Build & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWSBuild & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWS
Docker, Inc.
 

Plus de Docker, Inc. (20)

Containerize Your Game Server for the Best Multiplayer Experience
Containerize Your Game Server for the Best Multiplayer Experience Containerize Your Game Server for the Best Multiplayer Experience
Containerize Your Game Server for the Best Multiplayer Experience
 
How to Improve Your Image Builds Using Advance Docker Build
How to Improve Your Image Builds Using Advance Docker BuildHow to Improve Your Image Builds Using Advance Docker Build
How to Improve Your Image Builds Using Advance Docker Build
 
Build & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWSBuild & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWS
 
Securing Your Containerized Applications with NGINX
Securing Your Containerized Applications with NGINXSecuring Your Containerized Applications with NGINX
Securing Your Containerized Applications with NGINX
 
How To Build and Run Node Apps with Docker and Compose
How To Build and Run Node Apps with Docker and ComposeHow To Build and Run Node Apps with Docker and Compose
How To Build and Run Node Apps with Docker and Compose
 
Hands-on Helm
Hands-on Helm Hands-on Helm
Hands-on Helm
 
Distributed Deep Learning with Docker at Salesforce
Distributed Deep Learning with Docker at SalesforceDistributed Deep Learning with Docker at Salesforce
Distributed Deep Learning with Docker at Salesforce
 
The First 10M Pulls: Building The Official Curl Image for Docker Hub
The First 10M Pulls: Building The Official Curl Image for Docker HubThe First 10M Pulls: Building The Official Curl Image for Docker Hub
The First 10M Pulls: Building The Official Curl Image for Docker Hub
 
Monitoring in a Microservices World
Monitoring in a Microservices WorldMonitoring in a Microservices World
Monitoring in a Microservices World
 
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
 
Predicting Space Weather with Docker
Predicting Space Weather with DockerPredicting Space Weather with Docker
Predicting Space Weather with Docker
 
Become a Docker Power User With Microsoft Visual Studio Code
Become a Docker Power User With Microsoft Visual Studio CodeBecome a Docker Power User With Microsoft Visual Studio Code
Become a Docker Power User With Microsoft Visual Studio Code
 
How to Use Mirroring and Caching to Optimize your Container Registry
How to Use Mirroring and Caching to Optimize your Container RegistryHow to Use Mirroring and Caching to Optimize your Container Registry
How to Use Mirroring and Caching to Optimize your Container Registry
 
Monolithic to Microservices + Docker = SDLC on Steroids!
Monolithic to Microservices + Docker = SDLC on Steroids!Monolithic to Microservices + Docker = SDLC on Steroids!
Monolithic to Microservices + Docker = SDLC on Steroids!
 
Kubernetes at Datadog Scale
Kubernetes at Datadog ScaleKubernetes at Datadog Scale
Kubernetes at Datadog Scale
 
Labels, Labels, Labels
Labels, Labels, Labels Labels, Labels, Labels
Labels, Labels, Labels
 
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment ModelUsing Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
 
Build & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWSBuild & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWS
 
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
 
Developing with Docker for the Arm Architecture
Developing with Docker for the Arm ArchitectureDeveloping with Docker for the Arm Architecture
Developing with Docker for the Arm Architecture
 

Dernier

Dernier (20)

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Unikernels: the rise of the library hypervisor in MirageOS

  • 1. Unikernels: the Rise of the Library Hypervisor Anil Madhavapeddy, @avsm Mindy Preston, @yomimono Martin Lucina +the MirageOS and Docker for Mac/Win teams Docker Inc, @docker with contributions from IBM Docker Distributed Systems Summit 7th October 2016, Berlin, Germany
  • 2. Conventional hypervisors • Run full guest operating systems with complex emulation needs. • Scaffolding for device emulation, instruction emulation, etc. • Hard to compose into existing infrastructure without wrapping a full hypervisor layer. Xen Hypervisor qemu xenstored xenconsoled Hardware Dom0DomU
  • 3. Conventional hypervisors CVE-2016-3710: VGA emulation missing bounds checks causes exploit. CVE-2016-5403: unbounded virtio memory usage causes DoS. CVE-2016-3672: unrestricted qemu logging causes DoS. CVE-2015-8554: qemu-dm buffer overrun in MSI-X causes exploit. CVE-2015-7504: heap overflow in pcnet emulator causes exploit. • Run full guest operating systems with complex emulation needs. • Scaffolding for device emulation, instruction emulation, etc. • Hard to compose into existing infrastructure without wrapping a full hypervisor layer.
  • 4. How can distributed systems use hardware protection more flexibly and composably?
  • 5. Recap: Unikernels • "library operating systems" break kernels into libraries. • Link libraries with a boot layer, scheduler and application. • Portable microservices that boot directly on hypervisors or Unix. Xen Hardware App Linux Hardware DockerApp Configuration Business Logic HTTP JSON SSL TCP/IP Xen Devices Unix libev Unix musl libc Application Libraries Libraries
  • 6. Recap: Unikernels • Many benefits are lost when deploying on existing clouds. • Tiny binaries (200k) still require scaffolding of a full OS to boot. • Difficult to manage hypervisor from inside a container as full host privilege is needed. • "library operating systems" break kernels into libraries. • Link libraries with a boot layer, scheduler and application. • Portable microservices that boot directly on hypervisors or Unix.
  • 7. Library Hypervisors • Extend the "kit" model and break down hypervisor functionality into libraries. • Expose core functionality (CPU and memory) as library, and other pieces (device emulation) are optional. • Benefit: huge reduction in TCB, and better fit to container-native infrastructure with privilege dropping. • Drawback: no existing support in operating systems.
  • 8. Library Hypervisors • Extend the "kit" model and break down hypervisor functionality into libraries. • Expose core functionality (CPU and memory) as library, and other pieces (device emulation) are optional. • Benefit: huge reduction in TCB, and better fit to container-native infrastructure with privilege dropping. • Drawback: no existing support in operating systems. But let's a closer look!
  • 12. • Easy drag and drop installation, and autoupdates to get latest Docker. • Secure, sandboxed virtualisation architecture without elevated privileges. • Native networking support, with VPN and network sharing compatibility. • File sharing between container and host: uid mapping, inotify events, etc. Docker for Mac Aiming for a native OSX experience that works with existing developer workflows.
  • 13. • Uses the new HyperKit framework, which is in turn based on xHyve and FreeBSD's bHyve. • Sandbox friendly: processes largely run as non- root, with privileges of the local user. Virtualisation
  • 14. • Uses the new HyperKit framework, which is in turn based on xHyve and FreeBSD's bHyve. • Sandbox friendly: processes largely run as non- root, with privileges of the local user. Virtualisation OSX Kernel Hypervisor. framework Hardware virt: VMX, nested paging
  • 15. • Uses the new HyperKit framework, which is in turn based on xHyve and FreeBSD's bHyve. • Sandbox friendly: processes largely run as non- root, with privileges of the local user. Virtualisation OSX Kernel Userspace Hypervisor. framework User Process Thread/vCPU Traps on I/O pages Manages ACPI, PCI devices Hardware virt: VMX, nested paging
  • 16. • Uses the new HyperKit framework, which is in turn based on xHyve and FreeBSD's bHyve. • Sandbox friendly: processes largely run as non- root, with privileges of the local user. Virtualisation OSX Kernel Userspace Hypervisor. framework User ProcessHardware virt: VMX, nested paging Process Linux Kernel VirtIO IPC VirtIO Block VirtIO Net Alpine Linux Userspace Latest Docker preconfigured QCow2 VPNKit Logs redirected to OSX host
  • 17. • Uses the new HyperKit framework, which is in turn based on xHyve and FreeBSD's bHyve. • Embeds Linux: includes an embedded lightweight Alpine Linux distribution optimised for fast boot and stateless operation for containers. Virtualisation $ docker info Containers: 358 Running: 13 Paused: 0 Stopped: 345 Images: 485 Server Version: 1.11.1 Storage Driver: aufs Root Dir: /var/lib/docker/aufs Backing Filesystem: extfs Dirperm1 Supported: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge null host Kernel Version: 4.4.9-moby Operating System: Alpine Linux v3.3 OSType: linux Architecture: x86_64 CPUs: 2 Total Memory: 3.858 GiB
  • 18. HyperKit library structure • In HyperKit, most functionality is linked as a library. • If app doesn't need a protocol, it is not linked and not part of the trusted computing base.
  • 19. • Want to hide the gory details of virtualisation from the user. The Linux VM should be "invisible". • Not solving this leads to many user complaints: • VPN software and corporate installations do not like bridged virtual machines or custom routing.
 Result: container traffic cannot connect to Internet. • Services cannot be exposed on localhost or the external interface and are instead on the Linux VM IP address.
 Result: breaks common web oAuth workflows. Networking
  • 20. Networking OSX Kernel Userspace Hypervisor. framework HyperKitHardware virt: VMX, nested paging VirtIO IPC VirtIO Block VirtIO Net
  • 21. Networking OSX Kernel Userspace Hypervisor. framework HyperKitHardware virt: VMX, nested paging VirtIO IPC VirtIO Block VirtIO Net Ethernet In Containers! Containers! Containers!
  • 22. Networking OSX Kernel Userspace Hypervisor. framework HyperKitHardware virt: VMX, nested paging VirtIO IPC VirtIO Block VirtIO Net Ethernet In Bridge Ethernet Kernel Module Containers! Containers! Containers!
  • 23. • Want to hide the gory details of virtualisation from the user. The Linux VM should be "invisible". • Not solving this leads to many user complaints: • VPN software and corporate installations do not like bridged virtual machines or custom routing.
 Result: container traffic cannot connect to Internet. • Services cannot be exposed on localhost or the external interface and are instead on the Linux VM IP address.
 Result: breaks common web oAuth workflows. Networking
  • 24. Networking OSX Kernel Userspace Hypervisor. framework HyperKitHardware virt: VMX, nested paging VirtIO IPC VirtIO Block VirtIO Net Ethernet In Bridge Ethernet Kernel Module Containers! Containers! Containers!
  • 25. Networking OSX Kernel Userspace Hypervisor. framework HyperKitHardware virt: VMX, nested paging VirtIO IPC VirtIO Block VirtIO Net Ethernet In VPNKit MirageOS TCP/IP DNS Socketer Kernel Sockets Containers! Containers! Containers! github.com/docker/vpnkit
  • 26. • Challenge: Deal with custom VPN software on the host that makes it difficult to bridge. • Solution: VPNKit, efficiently reconstructs container traffic into separate TCP/IP flows and translates them into native OSX/Windows sockets. • Benefits: • All network traffic is generated from normal socket calls (e.g. gethostbyaddr) on the Mac, so interacts well with firewalls, VPNs, and any local security policies. Networking
  • 27. • Challenge: Services publishing ports should be exposed on localhost without needing VM info. • Solution: VPNKit forwards container port requests to a OSX service which binds them natively on its external interface. • Benefits: • docker run -P on the Mac now works without requiring any knowledge of the VM innards. • External oAuth workflows operate with web apps. Networking
  • 28. • Native OSX application, uses HyperKit to virtualise for domain-specific purpose ("docker run") • Links MirageOS unikernel libraries for networking and storage translation between OS boundaries. • The library approach let us glue together these components really easily. • Docker for Mac is quite a complex distributed system internally, but (hopefully) hidden from user. Docker for Mac + unikernels
  • 29. MirageOS 3 + Solo5 •Unikernels have been gathering pace; next challenge is to make them easily deployable. •Build handled via Docker, but docker run shouldn't need privileges (e.g. to start a VM). •MirageOS 3 has a new library hypervisor for Linux, developed by IBM, Docker and Cambridge University contributors. mirage.io
  • 30. MirageOS 3 + Solo5 • Source: https://github.com/Solo5/solo5 • Runs as a Unix process and opens /dev/kvm for hardware isolation. • ukvm is a small, modular monitor that links only what is needed. Can be 10k in size! • Can run privilege separated: one process opens /dev/ kvm and drops privileges and executes the unikernel. • Boot times are the same as process fork times, since all the device setup is handled in-process.
  • 31. MirageOS 3 + Solo5 Source: Dan Williams and Ricardo Koller, IBM Research, HotCloud 16
  • 32. MirageOS 3 + Solo5 • Due for stable release in the next month. • Intended to be "unikernel template" for other projects to share hypervisor code. • Liberally licensed under BSD/Apache2/ISC to encourage adoption and embedding. • BoF and tutorials tomorrow to demonstrate it. Developers are all here and hacking!
  • 33. Demo!
  • 34. How can distributed systems use hardware protection more flexibly and composably?
  • 35. Questions? Download free at docker.com Twitter: @avsm https://github.com/docker/hyperkit https://github.com/docker/vpnkit https://github.com/docker/datakit https://github.com/mirage/ We will be hacking tomorrow!
  • 37. • Challenge: Share arbitrary OSX directory tree into Linux container without requiring extensive modification of either side. • Solution: Use a FUSE forwarding layer and translate Linux filesystem calls to OSX equivalents. OSX Host Linux Host Container VOLUMEcom.docker.osxfs Track extra metadata Translate to OSX filesystem calls FUSE Filesystem Sharing
  • 38. • Challenge: Need filesystem activation so events on the Mac wake up container servers and vice-versa. • Solution: osxfs uses FSEvents API and injects inotify activation events into container. OSX Host Linux Host Container VOLUMEcom.docker.osxfs FSEvents watches open files Events from Linux causes OSX apps to wake up FUSE Filesystem Sharing
  • 39. • Challenge: Need filesystem activation so events on the Mac wake up container servers and vice-versa. • Solution: osxfs uses FSEvents API and injects inotify activation events into container. OSX Host Linux Host Container VOLUMEcom.docker.osxfs FSEvents watches open files Events from Linux causes OSX apps to wake up FUSE Filesystem Sharing
  • 40. • Challenge: Deal with custom VPN software on the host that makes it difficult to bridge. • Solution: VPNKit, efficiently reconstructs container traffic into separate TCP/IP flows and translates them into native OSX/Windows sockets. OSX Host Linux Host Container RUN <...>com.docker.hyperkit-net Reconstruct traffic TCP flows Translate to OSX socket calls Ethernet bridge DHCPv4 NTP Networking
  • 41. OSX Host Linux Host Privileged Port Service Container EXPOSE Port Service VSock Binder RUN <...> VSock Listener Userland Proxy • Challenge: Services publishing ports should be exposed on localhost without needing VM info. • Solution: VPNKit forwards container port requests to a OSX service which binds them natively on its external interface. Networking
  • 42. $ docker run resin/armv7hf-debian uname -a Linux 7ed2fca7a3f0 4.1.12 #1 SMP Tue Jan 12 10:51:00 UTC 2016 armv7l GNU/Linux $ docker run justincormack/ppc64le-debian uname -a Linux edd13885f316 4.1.12 #1 SMP Tue Jan 12 10:51:00 UTC 2016 ppc64le GNU/Linux Multi-CPU architectures