Top 15 Tips for vGPU Success - Part 3-3

Lee Bushen discusses his top tips for success during a NVIDIA vGPU Proof-of-concept.

  2. 2. TIP 1 – USE THE CORRECT INSTALL PROCESS 1. Do your research Hardware + Software to make sure your config is supported 2. Build your hypervisor 3. Register for an evaluation license and download the GRID software bundle and license server from the licensing portal 4. Install an NVIDIA License Server and install evaluation license file 5. Install the GPU card and GPU Manager software on hypervisor 6. Prepare Base VDI image without a GPU and configure RDP access 7. Configure VDI image with a vGPU Profile & boot VM 8. Install NVIDIA Windows Driver & license server IP 9. Reboot and connect to the VM to check license was acquired Overview of the procedure Deployment Guides: https://www.nvidia.com/en-us/data-center/virtualization/resources/
  3. 3. TIP 2 – GET AN EVALUATION LICENSE Best link yet: https://www.nvidia.com/object/vgpu-evaluation.html 90 Day Trial License – You get 128 vApps, vPC, vDWS, vCS It’s always better to get the customer to request the evaluation because it makes things much easier to apply their pay-for licenses to the installation when they buy them. If you are a solutions partner, don’t be tempted to think ahead and register for an evaluation on behalf of the customer because their license server will then be registered to your partner account and it’s harder to transfer later.
  4. 4. TIP 3 - DRIVER DOWNLOAD LOCATION GRID Drivers are downloaded from the Licensing Portal, not the driver download pages 1. Ignore this table 2. Click here to access the licensing portal 3. Drivers
  5. 5. TIP 4 - TURN OFF ECC Reboot If VMs won’t boot then the GPU might need ECC turned off * https://docs.nvidia.com/grid/latest/grid-software-quick-start-guide/index.html#disabling-enabling-ecc-memory 1 2 3 Disable ECC for all cards Disable ECC for card id 00000000:02:00.0 If you see this … * ECC Supported on Quadro and vCS Profiles since vGPU 9.0
  6. 6. TIP 5 - MEMORY ABOVE 1TB May be an issue with M10; not an issue with “Pascal” cards or later Hypervisor support of IOMMU causes issues on servers with more than 1TB of RAM Relevant to ESXi and XenServer, not Nutanix AHV VM failures or crashes may occur Follow the documentation for XenServer and vSphere Maxwell Cards can’t see greater than 1TB
  7. 7. TIP 6 - LICENSE SERVER Check out my video in this playlist! Follow the install process religiously!
  8. 8. TIP 7 - ESXI GPU SETTINGS Tips for VMware Customers • HOST>Configure>Graphics>Host Graphics • Ensure “Shared Direct” is selected or vGPU profiles will not be listed • If needed, follow highlights to enable vgpu.hotmigrate.enabled setting • Ensure you have Enterprise Plus licenses; you NEED vCenter1 2 3
  9. 9. TIP 8 - XENSERVER 7.5/7.6/8.0 • VMs with GPUs attached experience slower performance (than XenServer 7.1) • Can cause laggy graphics and slowdowns in general apps • Private (hidden) Hotfix is available from Citrix (reference SR78634793) or https://support.citrix.com/article/CTX250164 • Recommend moving to Citrix Hypervisor 8.2 (or latest version) • Hotfix XS80E003: https://support.citrix.com/article/CTX258320 Private Performance Hotfix
  10. 10. TIP 9 - AVOID DRIVER MISMATCH Keep the GPU Manager and VM’s Driver within the same major release Optimal Supported NOT Supported https://docs.nvidia.com/grid/ Note: vGPU 11 now has Cross-Branch support
  11. 11. 11 vGPU 9.0Guest vGPU 9.0Host vGPU 9.1 vGPU 10.0 vGPU 10.0 vGPU 10.0Guest vGPU 10.0Host vGPU 10.1 vGPU 11.0 vGPU 11.0 In-branch Compatibility (Pre vGPU 11) Cross-branch Compatibility (new in vGPU 11) vGPU 9.1 vGPU 10.0 Cross-Branch Compatibility New host driver with previous version of Guest driver now supported
  12. 12. TIP 10 - MIXING PROFILES Profiles must be homogenous per GPU 4B 4B Example 1 : mixing 4 GB & 2GB Frame Buffers 2B NVIDIA T4 No mixed Frame Buffer sizes or License types on the same GPU (first profile defines the type allowed) Cards with over 1 GPU (NVIDIA M10 & M60) offer more flexibility) 4B 4Q 4Q Example 2: mixing vDWS & vPC licenses Quadro RTX6000 4Q 4B
  13. 13. TIP 10 - MIXING PROFILES No mixed Frame Buffer sizes or License types on the same GPU (Cards with over 1 GPU (NVIDIA M10 & M60) offer more flexibility) NVIDIA M10 has 4 x GPUs each with 8GB RAM, so mixed profiles is possible 2Q 2Q 2Q 2Q 1B 1B 1B 1B 8A 4Q 4Q 1B 1B 1B 1B
  14. 14. TIP 11 - BLACK VM CONSOLE Ensure you have enabled RDP access inside the VM before installing the Driver Exception is XenServer which does show the VM’s console in XenCenter Console sessions will go blank after Driver Loads
  15. 15. TIP 12 - ISSUES WITH DCH DRIVER • DCH is a way of getting driver updates via Windows Update • The NVIDIA DCH driver is not currently compatible with vGPU • If Windows detects a GPU during Windows Update it will install the DCH driver automatically. Hard to revert image to vGPU driver afterwards • Windows Update will not install DCH if it finds an existing vendor driver installed • TIP: Do NOT run Windows update in between 1) Adding a GPU and 2) installing the vGPU Windows Driver • https://nvidia.custhelp.com/app/answers/detail/a_id/4777/~/nvidia-dch%2Fstandard-display-drivers- for-windows-10-faq Run Windows Update on base image before attaching vGPU Profile
  16. 16. TIP 13 - VGPU LICENSE OPERATION VM start License Allocation process – License allocation when VM starts License Server 5 4 License checked out VM shutdown License released 5 5 Licenses available Trusted store Trusted store Trusted store VM off
  17. 17. VGPU LICENSE OPERATION Golden Image Issues with cloning a VM that has checked out a vGPU license Provisioning PVS/MCS/Instant Clones/Linked Clones Trusted store Trusted store Trusted store Trusted store Trusted store Trusted store Trusted store Trusted store Trusted store Trusted store Trusted store Trusted store Trusted store Trusted store gets replicated to clones
  18. 18. VGPU LICENSE OPERATION Golden Image Solution #1 - PVS/MCS/Instant Clones/Fast Clones Remove the trusted store before cloning Trusted store Delete all the files under "<SystemDrive>:Program FilesNVIDIA CorporationGrid LicensingTrusted Storage" on the base vDisk image (if present). Note that these are hidden files with names like ‘amsdudhygcfzzycwceeezwbpuyeugyjs’
  19. 19. VGPU LICENSE OPERATION Golden Image Solution #2 - Inject license server details on VM boot Use image with no vGPU IP details set & trusted store cleared https://docs.nvidia.com/grid/latest/grid-licensing-user- guide/index.html#windows-registry-grid-license-settings [HKEY_LOCAL_MACHINESOFTWARENVIDIA CorporationGlobalGridLicensing] "ServerAddress"="" "ServerPort"="7070" "BackupServerAddress"="" "BackupServerPort"="7070" Sample REG file to run during boot Clones
  20. 20. VGPU LICENSE OPERATION If a VM cannot find a license server on boot or loses connection during operation (after grace period expired)  Grace period for running VMs is 24 hours since last check-in  Desktop limited to 3fps  On vGPU profiles that support CUDA, CUDA is disabled  GPU resource channels are limited, which will prevent some applications from running correctly.  Note: vGPU 11 has no restrictions for 20 minutes after a VM has booted, then relaxed restrictions until 24 hours is reached Connection loss to the license server(s) Note: vGPU 11.x has more relaxed grace period restrictions. See next slides
  21. 21. 21 • 3 fps • CUDA restrictions Boot Virtual Machine 24h Successful Checkout Boot Virtual Machine Unsuccessful Checkout BEFORE VIRTUAL GPU 11 • 3 fps • CUDA restrictions Full Restriction Full Restriction
  22. 22. 22 Boot Virtual Machine 24h Successful Checkout Full PerfVirtual Machine 24h Unsuccessful Checkout NEW IN VIRTUAL GPU 11 • 3 fps • Further CUDA restrictions 20 min • 15 fps • Some CUDA restrictions Degraded Full Restriction Degraded Boot • 15 fps • Some CUDA restrictions
  23. 23. TIP 14 - GPU DOESN’T WORK UNDER XENAPP Computer Configuration => Administrative Templates => Windows Components => Remote Desktop Services => Remote Desktop Session Host => Remote Session Environment Also Reg Settings for WPF/CUDA/OpenCL: https://docs.citrix.com/en-us/citrix-virtual-apps- desktops/graphics/hdx-3d-pro/gpu-acceleration- server.html Policy & Registry keys are required
  24. 24. TIP 15 - FUZZY FONTS YUV 4:2:0 Chroma Subsampling using a video codec (H.264/H.265) YUV 4:4:4 • Try changing to “Visually Lossless” Policy. • Try Bitmap codec • “Actively Changing Regions” policy is also good
  25. 25. TROUBLESHOOTING CHECKLIST Item Example Item Server: Hardware Model HP DL380 G10 VDI: GPU Profile (if using vGPU) T4-2Q Server: Hypervisor make and version VMware 6.7U3 VDI: OS version and build Win10 1903 Server: GPU model and number of cards 4 x NVIDIA T4 VDI: NVIDIA Driver version 432.08 Server: GPU Manager Version (vib) 430.67 VDI: Version of Remoting Agent 7.15 Server: Hardware (CPU, Speed, RAM, Disk, Network) HW spec Network: Bandwidth, Latency & User Location 20Mbps, 50ms, home Server: Other loads running Non-GPU enabled VMs Endpoint: Make/Model Dell/Wyse 5070 Remoting: Software (Horizon, XenDesktop etc..) XenApp 7.15 Endpoint: OS ThinOS Remoting: Protocol Policy (H.264/BMP etc.) H.264 HW encode, Quality-Medium, ACR Endpoint: version of Remoting Client Citrix WSA 1911 Remoting: Onsite or Cloud Onsite Endpoint: Res/Number of Displays 2 x 4K (3840x2160) VDI: RAM and number of vCPUs 24GB, 4 x vCPUs Endpoint: Apps Used Office, Catia VDI: vGPU, Pass-Through/DDA or vSGA vGPU Endpoint: App Characteristics Proviz application VDI: Name of VM if applicable Steps to reproduce issue If applicable Information to collect for support
  26. 26. QUALIFICATION QUESTION FOR VGPU Discussion points for potential vGPU Customers What is the reason for this project? Hardware upgrade, remote working, Perf.issue What is your workload? Office-only apps, video, ProViz Apps, Deep Learning/HPC What hardware do your users currently have? Physical Workstation, Non-GPU VDI What endpoint hardware will you have? Thin Clients, Laptops How many screens and what resolution Is 4K a target? Multiple 4K? What is your preferred Hypervisor and Remoting Stack? VMware ESX, Citrix, Horizon, Teradici What are your density aspirations? High-density VDI, High Perf. Professional graphics On-site deployment or Cloud? Mostly, Cloud uses a complete GPU, not fractions (vGPU)