GPU virtualization is hot in cloud usages including VDI, media processing, etc. While Intel GVT-g (a.k.a XenGT) helps unleash those compelling usages on Intel Processor Graphics, new requirements are emerging such as VM live migration with vGPU. In this session we will introduce the challenges of supporting vGPU live migration on current migration framework, then elaborate techniques to bring vGPU live migration into XenGT.
XPDS16: Live Migration of vGPU - Xiao Zheng, Intel Asia-Pacific Research & Development Ltd.
1. Live Migration of vGPU
Aug 2016
Xiao Zheng xiao.zheng@Intel.com
Kevin Tian kevin.tian@Intel.com
2. 2
Agenda
• GPU Virtualization and vGPU Live Migration
• vGPU Resources
• Design and Solution
• Current Status
• Summary
3. 3
GPU Virtualization Usage Cases
IT
2D/3D
Office
Productivity
Desktop
CADs
Media Could
Media
Process
3D/Media Acceleration
…
Network
Remote
Framebuffer
Streaming
VDI
Desktop#N
4. 4
XENGT Architecture – Mediated Pass-through
XEN
i915
GPU
Qemu
VGA
Host Linux
VM1
VM2
GFX
Driver
MPT Services
vGT
vGPU
vGPU
vPCI
layout
I/O
Hooks
Driver
Hooks
Address
space
balloon
Address space balloon
• pass-through for
performance critical
resource
• Trap and emulate for
privileged resource
• Time-shared among VMs
5. 5
vGPU Live Migration
Live Migration: Load balance, Maintenance, Fault recovery
Unfortunately most of vGPU solutions do not support migration except GVT-g
HW
GPU
GPU
GPU
GPU
Hypervisor
Guest vGPU
Typically a GPU
pass-through
solution
SRIOV HW
GPU PF VF … VF
Hypervisor
Guest vGPU
Typically a GPU
SRIOV solution
GVT-g with mediated
pass-through
XEN
i915
GPU
Qemu V
G
A
Host Linux
VM1
VM2
GFX
Driver
MPT Services
vGT
vGP
U
vGP
U
vPCI
layo
ut
I/O
Hook
s
Drive
r
Hook
s
Addre
ss
space
balloo
n
Address space balloon
GVT-g architecture (Mediation) make
it possible for seamless live migration
6. 6
Live Migration of vGPU in GVT-g
Highlight feature:
• GVT-g is Open Source project, upstream ongoing
• vGPU Live Migration follows existing hypervisor migration flow
• 3D/2D/Media graphics workload seamless migrated between Servers or Local machine
• Support Linux/Windows Guest
• Live Migration Service downtime latency < 0.3 sec (Guest RAM 2GB, assigned 512MB vGPU memory, 10Gpbs
adapter)
VM1
Office
uage
VM2
CADs
VM3*
Media
Process
… VDI
VM4
Media
Process
VM3
vGPU Server1 vGPU Server2
How?
7. 7
Demo: vGPU Live Migration with 3D workload
https://www.youtube.com/watch?v=y2SkU5JODIY
8. 8
Agenda
• GPU Virtualization and vGPU Live Migration
• vGPU Resources
• Design and Solution
• Current Status
• Summary
9. 9
Inside of vGPU instance
Graphics Memory
Render
Engine
Registers
GPU
Interrupt
pass-through for performance critical resource
Trap and emulate for privileged resource
Time-shared among VMs
Graphics
Memory
Render
Engine
Registers
vGPU
Interrupt
XenGT
Graphics
Memory
Render
Engine
Registers
vGPU
Interrupt
GTT page table
10. 10
Challenge of Migrating vGPU Instance
• When and how to migrate Graphics Memory
• When and how to migrate Guest Graphics Page Table
• When and how to migrate Render Engine State
Pre-copy
GPU dirty page logging
Stop and Copy
• Remove vGPU from scheduler
• Save and migrate all vGPU state
Resume/Post-Copy
Restore vGPU state
Add vGPU into scheduler
Total migration time
Service downtime
11. 11
Migration Policies for Different vGPU Resources
Context: Render Engine
GTT page table
Graphics Memory
Registers Copy and Restore
Recreate Shadowing
Track Dirty and Copy
Recreate Shadowing
12. 12
Agenda
• GPU Virtualization and vGPU Live Migration
• vGPU Resources
• Design and Solution
• Current Status
• Summary
13. 13
Guest GTT Page Table Migration
VM0
VM1*VM0
HW
GTT view
Target
HW GGTT view
GTT
0 ~ 512MB
Guest VM
GTT view
GTT
0 ~ 512MB
VM2
VM2VM1
GTT = GGTT or PPGTT
GMA = Graphics Mem Address
GMA rebasing
GM address = 0xA
GM address = 0xB
• Both GGTT and PPGTT are shadowed for Guest
• GGTT required rebasing due to GGTT partition
among VMs
• Migration process actually:
A. Copy entire Guest GTT page table
B. Re-create the shadow page table for Guest
on Target side
C. Rebasing GGTT for GPU commands
Migration
GMA rebasing
VM1
Graphics Memory Address rebasing:
All vGPU cmds from Guest need to be rebased on
new address in GVT-g before send to real GPU HW
GMA rebasing
14. 14
Guest Graphics Memory Migration
• Pre-copy: Logging dirty graphics memory pages
• Stop-and-Copy: Migrate contents to target
• Resume/Post-copy: Recreate GTT page table
based on target mfn
Problem:
Intel® GPU page table entities has no Dirty or
Accessed flags to track dirty pages
Solution:
Copy all used graphics memory to target.
GTT
0 ~ 512MB
Guest VM
GTT view
Source host
Physical Mem
Target host
Physical Mem
Source mfn
Target mfn
VM1
Migrate page content
15. 15
Render Engine State Migration
Server1
Render Engine
Render Context4
Render Context3
Render Context2
HW in idle
Context1 completed
Context0 completed
GGTT address
vGPU Guest submitted CTX
Context
in queue
Migration happens at this point
CTX N Render Context required to be migrated
o Intel® GPU HW is context based
o CTX locates in GGTT memory
o Render engine state is contained
within CTX
Server2
Render Engine
HW in idle
16. 16
Agenda
• GPU Virtualization and vGPU Live Migration
• vGPU Resources
• Design and Solution
• Current Status
• Summary
17. 17
Current Status
• Experimental support both KVMGT and XENGT
• Platforms: Intel® 5th /6th Generation Intel® Core™ Processors
• Benchmarks covered:
Windows guest: Heaven, 3Dmark06, Trophic, Media encoding/decoding, Linux guest: lightsmark, 2D
• Quality: 12hours overnight testing, migrating every 30sec
• Timing: (Guest RAM 2GB including 512MB Graphics memory, 10Gbps adapter)
Service downtime ~0.3sec
Total migration time: ~1.7sec
Pre-copy
GPU dirty page logging
Stop and Copy
• Remove vGPU scheduler
• Save and migrate all vGPU state
Resume/Post-Copy
Restore vGPU state
Add vGPU into scheduler
Total migration time ~1.7sec
Service downtime ~0.3sec
18. 18
Summary
• Need 3D/2D/Media workload in virtualization?
GVT-g is the choice
• Need GPU virtualization with migration support?
GVT-g is the choice
19. 19
Resource Links
• Project webpage and release: https://01.org/igvt-g
• Project public papers and document: https://01.org/group/2230/documentation-list
• Intel® IDF: GVT-g in Media Cloud: https://01.org/sites/default/files/documentation/sz15_sfts002_100_engf.pdf
• XenGT introduction in summit in 2015: http://events.linuxfoundation.org/sites/events/files/slides/XenGT-
Xen%20Summit-REWRITE%203RD%20v4.pdf
• XenGT introduction in summit in 2014: http://events.linuxfoundation.org/sites/events/files/slides/XenGT-
LinuxCollaborationSummit-final_1.pdf