Optimizing KVM virtual
machines
for performance
Boyan Krosnov
Chief of Product
StorPool Storage
● Better application performance -- e.g. time to load a page, time to rebuild,
time to execute specific query
● Happier cu...
● Compute - CPU and memory
● Storage
● Network
● Do you optimize for throughput or for latency?
Where:
● VM, guest OS, dri...
Typically 2x 10GE per hypervisor for storage traffic
Same or separate 2x 10GE for internet and inter-VM traffic
Typical cl...
Typically
- 2x E5-2697v3 -- 28 cores, 56 threads, @3.1GHz all-cores turbo
- 256-384-512 GB RAM
- 10/40 GigE NICs with RDMA...
RHEL7 Virtualization_Tuning_and_Optimization_Guide
Also
https://pve.proxmox.com/wiki/Performance_Tweaks
http://events.linu...
Recent Linux kernel, KVM and QEMU
… but beware of the bleeding edge
E.g. qemu-kvm-ev from RHEV (repackaged by CentOS)
tune...
● Use virtio-net driver
● regular virtio vs vhost_net
● SR-IOV (PCIe pass-through)
Networking
● cache=none -- direct IO, bypass host buffer cache
● io=native -- use Linux Native AIO, not POSIX AIO (threads)
●
● virti...
- balloon
- KSM (RAM dedup)
- huge pages, THP
- NUMA
- use local-node memory if you can
- route IRQs of network and storag...
Pinning
HT
NUMA
Compute - CPU
Demo
Boyan Krosnov
b k @ storool.com
@bkrosnov
https://storpool.com/
Prochain SlideShare
Chargement dans…5
×

Optimization of OpenNebula VMs for Higher Performance - Boyan Krosnov

1 488 vues

Publié le

OpenNebula TechDay Sofia, Bulgaria, 25 Feb 2016

Publié dans : Logiciels
0 commentaire
1 j’aime
Statistiques
Remarques
  • Soyez le premier à commenter

Aucun téléchargement
Vues
Nombre de vues
1 488
Sur SlideShare
0
Issues des intégrations
0
Intégrations
187
Actions
Partages
0
Téléchargements
13
Commentaires
0
J’aime
1
Intégrations 0
Aucune incorporation

Aucune remarque pour cette diapositive

Optimization of OpenNebula VMs for Higher Performance - Boyan Krosnov

  1. 1. Optimizing KVM virtual machines for performance Boyan Krosnov Chief of Product StorPool Storage
  2. 2. ● Better application performance -- e.g. time to load a page, time to rebuild, time to execute specific query ● Happier customers (in cloud / multi-tenant environments) ● Lower cost per delivered resource (per VM) ○ through higher density Why optimize
  3. 3. ● Compute - CPU and memory ● Storage ● Network ● Do you optimize for throughput or for latency? Where: ● VM, guest OS, drivers, etc. ● Host OS and hypervisor ● Host hardware ● Network ● Storage system What
  4. 4. Typically 2x 10GE per hypervisor for storage traffic Same or separate 2x 10GE for internet and inter-VM traffic Typical cluster has just 2 switches. Up to 96x 10GE ports at low cost. We are starting to see 40/56 GigE clusters and we expect many 25 GigE networks in the next year. VLANs, Jumbo frames, flow control. RDMA A word on networks
  5. 5. Typically - 2x E5-2697v3 -- 28 cores, 56 threads, @3.1GHz all-cores turbo - 256-384-512 GB RAM - 10/40 GigE NICs with RDMA - - firmware versions and BIOS settings matter - Understand power management -- esp. C-states and P-states - Think of rack level optimization - how do we get the lowest total cost per delivered resource. A word on host hardware
  6. 6. RHEL7 Virtualization_Tuning_and_Optimization_Guide Also https://pve.proxmox.com/wiki/Performance_Tweaks http://events.linuxfoundation.org/sites/events/files/slides/CloudOpen2013_Khoa_Huynh_v3.pdf http://www.linux-kvm.org/images/f/f9/2012-forum-virtio-blk-performance-improvement.pdf http://www.slideshare.net/janghoonsim/kvm-performance-optimization-for-ubuntu … but don’t trust everything you read. Perform your own benchmarking! Good references
  7. 7. Recent Linux kernel, KVM and QEMU … but beware of the bleeding edge E.g. qemu-kvm-ev from RHEV (repackaged by CentOS) tuned-adm virtual-host tuned-adm virtual-guest Host OS, guest OS
  8. 8. ● Use virtio-net driver ● regular virtio vs vhost_net ● SR-IOV (PCIe pass-through) Networking
  9. 9. ● cache=none -- direct IO, bypass host buffer cache ● io=native -- use Linux Native AIO, not POSIX AIO (threads) ● ● virtio-blk -> dataplane ● virtio-scsi -> multiqueue ● ● in guest virtio_blk.queue_depth 128 -> 256 Block I/O
  10. 10. - balloon - KSM (RAM dedup) - huge pages, THP - NUMA - use local-node memory if you can - route IRQs of network and storage adapters to a core on the node they are on Compute - Memory
  11. 11. Pinning HT NUMA Compute - CPU
  12. 12. Demo
  13. 13. Boyan Krosnov b k @ storool.com @bkrosnov https://storpool.com/

×