This document summarizes a presentation on adapting Xen for multi-tenant virtualization on ARM-based embedded devices with FPGA acceleration. It discusses moving PV drivers and I/O handling into the hypervisor to reduce overhead. Performance tests show MicroVisor reduces boot times and I/O latency compared to stock Xen. Integrating FPGA acceleration for networking and storage aims to improve performance for network function virtualization workloads on low-power edge devices.
Xen-lite for ARM: Adapting Xen for a Samsung Exynos MicroServer with Hybrid FPGA IO Acceleration
1. Xen-lite for ARM: Adapting Xen for a Samsung Exynos
MicroServer with Hybrid FPGA IO Acceleration
Presented by: Julian Chesterfield, Chief Scientific Officer, OnApp Ltd, julian@onapp.com
Contributing work from: Anastassios Nanos, Xenia Ragiadakou, Michail Flouris
firstname.lastname@onapp.com
2. Xen Summit, Budapest, July 13th 2017.
Setting the scene…
• increasing focus on embedded and integrated System-on-Chip hardware
• Mobile devices
• Autonomous vehicles
• edge server multi-tenant devices
• ….
• ARM features prominently in the landscape as mobile device processors become
more power efficient alternative
• Significantly lower power, but much smaller resources (RAM, CPU, Network)
• Parallel growth of integrated accelerators such as GPU and/or FPGA hardware
(co-processors ARM/Xilinx, Intel/Altera)
2
3. Xen Summit, Budapest, July 13th 2017.
OnApp focus on HyperConverged Embedded Devices
3
• Hyper-Converged Infrastructure:
• Software Defined Compute (Hypervisor Virtualisation)
• Software Defined Networking (SDN, Openflow etc..)
• Software Defined Storage (SDS)
• Fast growing infrastructure orchestration trend in enterprise DC
• SDS - Utilising commodity direct attached storage devices
• Software controlled distributed block storage for Virtual machines
• Software control is extremely advantageous
• fast dynamic reconfiguration
• feature updates
• no hardware appliance dependency
• But performance is significantly impacted
=> OnApp focus on merging commodity virtualisation with hardware accelerated IO
4. Xen Summit, Budapest, July 13th 2017.
OnApp/Kaleao Server Architecture
4
12 x 13 cm
• Hardware accelerated I/O
• Low-power
• Share-nothing
• UNIMEM coherent memory
access across compute nodes
• Samsung Galaxy S6 Exynos
7420 chipset
Samsung Exynos 7420
IO FPGAs
5. Xen Summit, Budapest, July 13th 2017.
Deployment
COMPUTE UNIT
1 big.LITTLE Server
8x ARM 64-bit Cores
128GB NV-CACHE
4GB DDR4 at 25 GB/s
20 Gb/s IO Bandwidth
15W Peak Power
NODE
4x Compute Units
2xZynq FPGA SoCs
7.68 TB NVMe SSD
STORAGE AT 1.9GB/s
(NVMe over Fabric)
2 x 10Gb ETHERNET
4x
4x
12x
>
BLADE
4x Nodes
30.8TB NVMe
2x 40Gb/s
Embedded 10/40Gb
Ethernet Switch
PRODUCTION A and B
3U CHASSIS
12 BLADES
192 X SERVERS
1,532 X CORES
370TB NVMe FLASH
48x 40GbE (960Gb/s)
Ethernet (stackable)
3KW Peak Power
External 48V
RACKS
STANDARD
42U RACKS
21,504 ARM 64b Cores
10.752 TB LPDDR4
344 TB NV-Cache
5.16 PB of NVMe SSD
13,440 Gb/s Ethernet
KALEAO KMAX
5
KALEAO Integrated PCB (Compute Node)
6. Xen Summit, Budapest, July 13th 2017.
KALEAO Integrated PCB (Compute Node)
6
Dedicated PCI Lanes
PCI bus
PCI bus
PCI bus
PCI bus
IO Mirroring
FPGA defined Network virtualisation
FPGA defined Storage virtualisation
7. Xen Summit, Budapest, July 13th 2017.
KALEAO Integrated PCB (Compute Node)
7
Dedicated PCI Lanes
PCI bus
PCI bus
PCI bus
PCI bus
IO Mirroring
FPGA defined Network virtualisation
FPGA defined Storage virtualisation
Software Defined Hardware!!
8. Xen Summit, Budapest, July 13th 2017.
Emerging Software Defined Hardware IO Architectures
• KMAX represents a common emerging theme that other integrated SoC
servers are moving towards
• Centralised, smart virtualisation of IO resources in hardware
• hardware mapping of virtualised IO across non-cache coherent endpoints
• embedded PCI or Fibre fabric
• Facebook ‘Yosemite’ architecture with Intel XeonD processors
• NVMe over Fabric
8
10. Xen Summit, Budapest, July 13th 2017.
Multi-tenancy on low power ARM
• Multi-tenant server support is as important (if not more important) on ARM as Intel arch
• efficient utilisation of hardware resources
• application execution and isolation on critical systems (unikernels)
• ARM CPU architecture is really well suited to Xen!
• EL0/EL1 Hypervisor trap overhead into EL2 is lower than Intel
but…….
Context switch overhead of handling IO via a Dom0 or stub-domain significantly overshadows
any Type 1 architecture benefits
[“ARM Virtualization: Performance and Architectural Implications”, C.Dall et al, ISCA 2016]
10
11. Xen Summit, Budapest, July 13th 2017.
Third party comparisons of Xen vs KVM on ARM
• “operations such as accessing registers in the emulated
GIC, sending virtual IPIs, and receiving virtual interrupts
are much faster on Xen ARM than KVM ARM”
11
12. Xen Summit, Budapest, July 13th 2017.
Poor I/O performance on Xen vs KVM
• “a VM performing I/O has to communicate with Dom0
and not just the Xen hypervisor, which means not just
trapping to EL2, but also going to EL1 to run Dom0”
12
13. Xen Summit, Budapest, July 13th 2017.
Revisiting some IO Architectural Assumptions for
low power ARM SoCs
15. Xen Summit, Budapest, July 13th 2017.
1. Move PV backend support into the VMM layer
2. Implement unified IO backend combining packet switch/Block transition logic to AoE
frames in the VMM layer
3. Experiment with realtime service driver domain in EL1 (miniOS + integrated hardware
driver) and/or integrated driver in EL2
4. Xenbus + xenstore integration into VMM layer to reduce overhead and speedup
device initialisation for integrated backend drivers
5. Lightweight network based remote management interface
15
Architectural Review
20. Xen Summit, Budapest, July 13th 2017.
Example Usecases: NFV Service Function Chaining for the Edge
21. Xen Summit, Budapest, July 13th 2017.
• Rapid instantiation of virtualised network functions
• instantiate on demand
• fast boot and init required e.g. to respond to realtime TCP connection instantiation
• Lightweight packet processing in software with hardware passthrough of accelerated NIC functions
• IP firewalls
• NAT gateways
• Custom traffic shapers
• FPGA handles SDN overlays and ethernet forwarding logic
21
Superfluidity: Network Function Virtualisation
22. Xen Summit, Budapest, July 13th 2017. 22
Superfluidity: Network Virtualisation Overlay Management
Drag and drop network
function instances
25. Xen Summit, Budapest, July 13th 2017.
MicroVisor guest Boot time (vs Stock Xen)
• spawn guests in
parallel
• start timer at spawn
• stop timer at first ping
from the guest
(triggered from the last
service in the boot
chain)
25
26. Xen Summit, Budapest, July 13th 2017.
VM boot time breakdown
26
MV Stock Xen
Time(s)
Number of VMs
27. Xen Summit, Budapest, July 13th 2017.
Intra-node Communication Latency
27
• X-Gene R1 (8x A57) + SF 7000
28. Xen Summit, Budapest, July 13th 2017.
Inter-Node Communication Latency (raw ETH)
Time(us)
• Off-the-shelf Intel 10GbE
28
• 1-way latency
• No TCP/UDP protocols involved —
custom raw ethernet latency tool
30. Xen Summit, Budapest, July 13th 2017.
Summary & Status
30
• Many core, integrated low power SoC designs are becoming much more common across a variety of industries due to cost, power efficiency and
performance
• Integrated hardware acceleration technology (FPGA, GPU) features prominently in merging hardware
• Xen is well suited to ARM architecture multi-tenant operation but loses significant performance on processing I/O
• particular problem for NFV on ARM edge devices
• Moving a minimal set of services into EL2 significantly improves performance
• PV backend drivers
• Integrated block and ethernet frame switch logic
• Xenbus/xenstore communication service
• Functional platform with all basic PV backend support, lightweight remote management interface implemented for both ARM and Intel platform
• Integrated FPGA network/storage device drivers
• PoC for Intel ixgbe driver
• Xen-lite code changes will be open sourced shortly
31. Xen Summit, Budapest, July 13th 2017.
Thanks!
More info:
julian@onapp.com
https://onapp.com
https://superfluidity.eu
31
32. Xen Summit, Budapest, July 13th 2017. 32
This project has received funding from the European Union’s Horizon 2020 research
and innovation programme under Grant Agreement no 671566 (SUPERFLUIDITY)