In this presentation Matt Herreras and Josh Simons describe how Hybrid Cloud powered by virtualization offers increased scientific agility for HPC workloads.
Make no mistake; virtualization is coming to HPC in a Big Way, and everyone will benefit.
Learn more: http://cto.vmware.com/author/joshsimons/
Watch the video presentation: http://wp.me/p3RLHQ-baU
2. Our Goal
Increase agility and decrease time to solution
for scientists and engineers using
virtualization and cloud technologies
2
3. Problem Statements
• Fundamental tension between end-users and administrators
– End-user need for compute environments customized and optimized for
their specific requirements
– Administrator need to deploy a secure, cost-effective, standard compute
infrastructure
• Which leads to
– Disconnected islands of compute with security risk and cost
inefficiencies
• Current HPC job scheduling approaches are not agile
– Job queuing delays can increase time to solution
– May overcommit physical resources on cluster nodes
• Which leads to
– Reduced cluster throughput and inherent inefficiencies
3
4. How Cloud and Virtualization Can Help
• Use virtualized HPC cloud infrastructure to simultaneously support the
needs of both scientist / engineers and administrators
• Leverage inherent virtual platform capabilities to increase cluster
throughput and increase efficiency
• Enable rapid end-user self-provisioning of new compute resources
• Provide policy-based enforcement of fair-share resource usage in
multi-tenant environments
• Automate secure, compliant policy-based workload separation
• Add fault recovery and avoidance capabilities to protect applications
from hardware failure, and to increase cluster throughput
4
5. End-User Customization
Support groups with disparate software requirements
Including root access
App A
App B
OS A
OS B
virtualizationOS
Standard layer
virtualizationOS
Standard layer
virtualizationOS
Standard layer
hardware
hardware
hardware
6. Workload Separation
Secure multi-tenancy
Fault isolation
…and sometimes performance
App A
App B
OS A
OS B
virtualization layer
virtualization layer
virtualization layer
hardware
hardware
hardware
8. Use Resources More Efficiently
Avoid killing or pausing jobs
App C
Increase overall throughput
OS A
App A
OS A
App B
App D
OS B
OS A
OS
OS
virtualization layer
hardware
virtualization layer
hardware
App E
OS
OS B
virtualization layer
hardware
10. Secure Private and Public Cloud for HPC
Research Group 1
Research Group m
Users
IT
Public Clouds
VMware vCloud Automation Center
User Portals
Blueprints
Security
VMware vCAC API
Research Cluster 1
Research Cluster n
VMware
vShield
Programmatic
Control and
Integrations
VMware
vCenter Server
VMware
vCenter Server
VMware
vCenter Server
VMware vSphere
VMware vSphere
VMware vSphere
11. Summary
• New approach to delivering HPC resources using cloud and
virtualization technologies that can uniquely address the conflicting
needs of end-users and administrators
• Move away from traditional static job placement to a more
flexible, dynamic environment to raise throughput and increase
scientific agility
• VMware continues to drive virtualization platform improvements to
address an expanding range of HPC workloads
• Just a taste offered here. See papers and blog listed in Resources for
detailed information on workload performance, platform tuning, etc.
11
12. Resources
• CTO HPC blog:
– http://cto.vmware.com/tag/hpc
• ACM Operating Systems Review paper:
– Virtualizing High Performance Computing
• HPCvirt 2011 workshop papers:
– Performance Evaluation of HPC Benchmarks on VMware's ESX Server
• http://labs.vmware.com/publications/performance-evaluation-of-hpc-benchmarks-on-vmwares-
esxi-server
– Virtualizing Performance Counters
• http://labs.vmware.com/publications/virtualizing-performance-counters
• Latency whitepaper:
– Best Practices for Performance Tuning of Latency-Sensitive Workloads in
vSphere VMs
• http://www.vmware.com/files/pdf/techpaper/VMW-Tuning-Latency-Sensitive-Workloads.pdf
13. Resources
• Big Data / Hadoop technical whitepaper
– Virtualized Hadoop Performance with VMware vSphere 5.1
• http://www.vmware.com/resources/techresources/10360
• InfiniBand performance
– RDMA Performance in Virtual Machines with QDR InfiniBand on vSphere 5
• http://labs.vmware.com/publications/ib-researchnote-apr2012
• Paravirtualized RMDA
– Toward a Paravirtual RDMA Device for VMware ESXi Guests
• http://labs.vmware.com/publications/vrdma-vmtj-winter2012