SlideShare a Scribd company logo
1 of 9
SR-IOV: The Key Enabling Technology for
Fully Virtualized HPC Clusters!
Glenn K. Lockwood!
Christopher Irving!
Philip M. Papadopoulos!
Mahidhar Tatineni!
Rick Wagner!
SAN DIEGO SUPERCOMPUTER CENTER
Single Root I/O Virtualization in HPC!
•  Problem: complex workflows demand increasing
flexibility from HPC platforms"
•  Virtualization = flexibility"
•  Virtualization = IO performance loss (e.g.,
excessive DMA interrupts)"
•  Solution: SR-IOV and Mellanox ConnectX-3
InfiniBand HCAs "
•  One physical function (PF) à multiple virtual
functions (VF), each with own DMA streams,
memory space, interrupts"
•  Allows DMA to bypass hypervisor to VMs!
SAN DIEGO SUPERCOMPUTER CENTER
High-Performance Virtualization
on Comet !
•  Mellanox FDR InfiniBand HCAs with SR-IOV"
•  Rocks and OpenStack Nova to manage VMs"
•  Flexibility to support complex science
gateways and web-based workflow engines"
•  custom compute appliances and virtual clusters
developed with FutureGrid and their existing
expertise"
•  backed by virtualized Lustre running over virtualized
InfiniBand"
SAN DIEGO SUPERCOMPUTER CENTER
Hardware/Software Configuration of Test Cluster !
Native, SR-IOV!

Amazon EC2!

Platform"

•  Rocks 6.1 (EL6)"
•  Virtualization via kvm	
  

•  Amazon Linux 2013.03 (EL6)"
•  cc2.8xlarge Instances"

CPUs"

•  2x Xeon E5-2660 (2.2GHz)"
•  16 cores per node"

•  2x Xeon E5-2670 (2.6GHz)"
•  16 cores per node"

RAM"

•  64 GB DDR3 DRAM"

•  60.5 DDR3 DRAM"

Interconnect" •  QDR4X InfiniBand"
•  10 GbE"
•  Mellanox ConnectX-3 (MT27500)" •  common placement group"
•  Intel VT-d, SR-IOV enabled in
firmware, kernel, drivers"
•  mlx4_core	
  1.1"
•  Mellanox OFED 2.0"
•  HCA firmware 2.11.1192"
SAN DIEGO SUPERCOMPUTER CENTER
50x less latency than Amazon EC2!
•  SR-IOV!
•  < 30% overhead for M <
128 bytes"
•  < 10% overhead for
eager send/recv"
•  Overhead à 0% for
bandwidth-limited regime"
•  Amazon EC2!
•  > 5000% worse latency"
•  Time dependent (noisy)"
5
	

SAN DIEGO SUPERCOMPUTER CENTER

OSU Microbenchmarks (3.9, osu_latency)"
10x more bandwidth than Amazon EC2!
•  SR-IOV!
•  < 2% bandwidth loss
over entire range"
•  > 95% peak bandwidth"
•  Amazon EC2!
•  < 35% peak bandwidth"
•  900% to 2500% worse
bandwidth than
virtualized InfiniBand"
6
	

SAN DIEGO SUPERCOMPUTER CENTER

OSU Microbenchmarks (3.9, osu_bw)"
Weather Modeling – 15% Overhead!
•  96-core (6-node)
calculation"
•  Nearest-neighbor
communication"
•  Scalable algorithms"
•  SR-IOV incurs modest
(15%) performance hit"
•  ...but still still 20%
faster*** than Amazon"
SAN DIEGO SUPERCOMPUTER CENTER

WRF 3.4.1 – 3hr forecast"
*** 20% faster despite SR-IOV cluster having 20% slower CPUs"
Quantum ESPRESSO: 5x Faster than EC2!
•  48-core, 3 node calc"
•  CG matrix inversion
(irregular comm.)"
•  3D FFT matrix
transposes (All-to-all
communication)"
•  28% slower w/ SR-IOV"
•  SR-IOV still > 500%
faster*** than EC2"
SAN DIEGO SUPERCOMPUTER CENTER

Quantum Espresso 5.0.2 – DEISA AUSURF112 benchmark"
*** 20% faster despite SR-IOV cluster having 20% slower CPUs"
Conclusions!
•  SR-IOV: huge step forward in high-performance virtualization"
•  Shows substantial improvement in latency over Amazon EC2, and
it provides nearly 0 bandwidth overhead!
•  Benchmark application performance confirms this: significant
improvement over EC2"

•  SR-IOV: lowers performance barrier to virtualizing the
interconnect and makes fully virtualized HPC clusters
viable!
•  Comet will deliver virtualized HPC to new/non-traditional
communities that need flexibility without major loss of performance!
SAN DIEGO SUPERCOMPUTER CENTER

More Related Content

What's hot

Andes RISC-V vector extension demystified-tutorial
Andes RISC-V vector extension demystified-tutorialAndes RISC-V vector extension demystified-tutorial
Andes RISC-V vector extension demystified-tutorialRISC-V International
 
Linux kernel memory allocators
Linux kernel memory allocatorsLinux kernel memory allocators
Linux kernel memory allocatorsHao-Ran Liu
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabMichelle Holley
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and moreBrendan Gregg
 
Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VStatic partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VRISC-V International
 
Ovs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadOvs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadKevin Traynor
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernelAdrian Huang
 
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V International
 
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203Linaro
 
Project ACRN hypervisor introduction
Project ACRN hypervisor introduction Project ACRN hypervisor introduction
Project ACRN hypervisor introduction Project ACRN
 
Linux KVM のコードを追いかけてみよう
Linux KVM のコードを追いかけてみようLinux KVM のコードを追いかけてみよう
Linux KVM のコードを追いかけてみようTsuyoshi OZAWA
 
Linux kernel architecture
Linux kernel architectureLinux kernel architecture
Linux kernel architectureSHAJANA BASHEER
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)shimosawa
 
BUD17-400: Secure Data Path with OPTEE
BUD17-400: Secure Data Path with OPTEE BUD17-400: Secure Data Path with OPTEE
BUD17-400: Secure Data Path with OPTEE Linaro
 

What's hot (20)

Andes RISC-V vector extension demystified-tutorial
Andes RISC-V vector extension demystified-tutorialAndes RISC-V vector extension demystified-tutorial
Andes RISC-V vector extension demystified-tutorial
 
Linux kernel memory allocators
Linux kernel memory allocatorsLinux kernel memory allocators
Linux kernel memory allocators
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on Lab
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Pci express technology 3.0
Pci express technology 3.0Pci express technology 3.0
Pci express technology 3.0
 
Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VStatic partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-V
 
Ovs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadOvs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offload
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernel
 
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
 
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
 
PCI Drivers
PCI DriversPCI Drivers
PCI Drivers
 
Andes RISC-V processor solutions
Andes RISC-V processor solutionsAndes RISC-V processor solutions
Andes RISC-V processor solutions
 
eBPF maps 101
eBPF maps 101eBPF maps 101
eBPF maps 101
 
Memory model
Memory modelMemory model
Memory model
 
Project ACRN hypervisor introduction
Project ACRN hypervisor introduction Project ACRN hypervisor introduction
Project ACRN hypervisor introduction
 
Linux KVM のコードを追いかけてみよう
Linux KVM のコードを追いかけてみようLinux KVM のコードを追いかけてみよう
Linux KVM のコードを追いかけてみよう
 
UEFI presentation
UEFI presentationUEFI presentation
UEFI presentation
 
Linux kernel architecture
Linux kernel architectureLinux kernel architecture
Linux kernel architecture
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)
 
BUD17-400: Secure Data Path with OPTEE
BUD17-400: Secure Data Path with OPTEE BUD17-400: Secure Data Path with OPTEE
BUD17-400: Secure Data Path with OPTEE
 

Viewers also liked

Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...
Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...
Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...Arm
 
A practical approach to securing embedded and io t platforms
A practical approach to securing embedded and io t platformsA practical approach to securing embedded and io t platforms
A practical approach to securing embedded and io t platformsArm
 
Developing functional safety systems with arm architecture solutions stroud
Developing functional safety systems with arm architecture solutions   stroudDeveloping functional safety systems with arm architecture solutions   stroud
Developing functional safety systems with arm architecture solutions stroudArm
 
Efficient software development with heterogeneous devices
Efficient software development with heterogeneous devicesEfficient software development with heterogeneous devices
Efficient software development with heterogeneous devicesArm
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...Glenn K. Lockwood
 
Software development in ar mv8 m architecture - yiu
Software development in ar mv8 m architecture - yiuSoftware development in ar mv8 m architecture - yiu
Software development in ar mv8 m architecture - yiuArm
 
So you think developing an SoC needs to be complex or expensive?
So you think developing an SoC needs to be complex or expensive?So you think developing an SoC needs to be complex or expensive?
So you think developing an SoC needs to be complex or expensive?Arm
 
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)inside-BigData.com
 
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUGMicroservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUGMarcin Grzejszczak
 

Viewers also liked (9)

Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...
Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...
Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...
 
A practical approach to securing embedded and io t platforms
A practical approach to securing embedded and io t platformsA practical approach to securing embedded and io t platforms
A practical approach to securing embedded and io t platforms
 
Developing functional safety systems with arm architecture solutions stroud
Developing functional safety systems with arm architecture solutions   stroudDeveloping functional safety systems with arm architecture solutions   stroud
Developing functional safety systems with arm architecture solutions stroud
 
Efficient software development with heterogeneous devices
Efficient software development with heterogeneous devicesEfficient software development with heterogeneous devices
Efficient software development with heterogeneous devices
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
 
Software development in ar mv8 m architecture - yiu
Software development in ar mv8 m architecture - yiuSoftware development in ar mv8 m architecture - yiu
Software development in ar mv8 m architecture - yiu
 
So you think developing an SoC needs to be complex or expensive?
So you think developing an SoC needs to be complex or expensive?So you think developing an SoC needs to be complex or expensive?
So you think developing an SoC needs to be complex or expensive?
 
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
 
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUGMicroservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
 

Similar to SR-IOV: The Key Enabling Technology for Fully Virtualized HPC Clusters

AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...Ryousei Takano
 
Sharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual MachinesSharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual Machinesinside-BigData.com
 
Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...
Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...
Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...inside-BigData.com
 
Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Fwdays
 
Platforms for Accelerating the Software Defined and Virtual Infrastructure
Platforms for Accelerating the Software Defined and Virtual InfrastructurePlatforms for Accelerating the Software Defined and Virtual Infrastructure
Platforms for Accelerating the Software Defined and Virtual Infrastructure6WIND
 
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWSArquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWSAmazon Web Services LATAM
 
The von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st CenturyThe von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st CenturyPerry Lea
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...rhatr
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud ComputingAmazon Web Services
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud ComputingAmazon Web Services
 
#VMUGMTL - Xsigo Breakout
#VMUGMTL - Xsigo Breakout#VMUGMTL - Xsigo Breakout
#VMUGMTL - Xsigo Breakout1CloudRoad.com
 
The Pace of Innovation - Pop-up Loft Tel Aviv
The Pace of Innovation - Pop-up Loft Tel AvivThe Pace of Innovation - Pop-up Loft Tel Aviv
The Pace of Innovation - Pop-up Loft Tel AvivAmazon Web Services
 
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...Paul Brebner
 
Real-time DeepLearning on IoT Sensor Data
Real-time DeepLearning on IoT Sensor DataReal-time DeepLearning on IoT Sensor Data
Real-time DeepLearning on IoT Sensor DataRomeo Kienzler
 
Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes
Cassandra Summit 2014: Active-Active Cassandra Behind the ScenesCassandra Summit 2014: Active-Active Cassandra Behind the Scenes
Cassandra Summit 2014: Active-Active Cassandra Behind the ScenesDataStax Academy
 
ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies
 ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies
ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case StudiesOpenNebula Project
 
ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies
 ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies
ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case StudiesIgnacio M. Llorente
 
ClickOS_EE80777777777777777777777777777.pptx
ClickOS_EE80777777777777777777777777777.pptxClickOS_EE80777777777777777777777777777.pptx
ClickOS_EE80777777777777777777777777777.pptxBiHongPhc
 

Similar to SR-IOV: The Key Enabling Technology for Fully Virtualized HPC Clusters (20)

AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 
Sharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual MachinesSharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual Machines
 
Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...
Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...
Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...
 
Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...
 
Platforms for Accelerating the Software Defined and Virtual Infrastructure
Platforms for Accelerating the Software Defined and Virtual InfrastructurePlatforms for Accelerating the Software Defined and Virtual Infrastructure
Platforms for Accelerating the Software Defined and Virtual Infrastructure
 
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWSArquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
 
The von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st CenturyThe von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st Century
 
Big Data Management at CERN: The CMS Example
Big Data Management at CERN: The CMS ExampleBig Data Management at CERN: The CMS Example
Big Data Management at CERN: The CMS Example
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
#VMUGMTL - Xsigo Breakout
#VMUGMTL - Xsigo Breakout#VMUGMTL - Xsigo Breakout
#VMUGMTL - Xsigo Breakout
 
The Pace of Innovation - Pop-up Loft Tel Aviv
The Pace of Innovation - Pop-up Loft Tel AvivThe Pace of Innovation - Pop-up Loft Tel Aviv
The Pace of Innovation - Pop-up Loft Tel Aviv
 
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
 
Real-time DeepLearning on IoT Sensor Data
Real-time DeepLearning on IoT Sensor DataReal-time DeepLearning on IoT Sensor Data
Real-time DeepLearning on IoT Sensor Data
 
Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes
Cassandra Summit 2014: Active-Active Cassandra Behind the ScenesCassandra Summit 2014: Active-Active Cassandra Behind the Scenes
Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes
 
ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies
 ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies
ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies
 
ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies
 ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies
ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies
 
ClickOS_EE80777777777777777777777777777.pptx
ClickOS_EE80777777777777777777777777777.pptxClickOS_EE80777777777777777777777777777.pptx
ClickOS_EE80777777777777777777777777777.pptx
 

More from Glenn K. Lockwood

Understanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceUnderstanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceGlenn K. Lockwood
 
ASCI Terascale Simulation Requirements and Deployments
ASCI Terascale Simulation Requirements and DeploymentsASCI Terascale Simulation Requirements and Deployments
ASCI Terascale Simulation Requirements and DeploymentsGlenn K. Lockwood
 
Large-scale Genomic Analysis Enabled by Gordon
Large-scale Genomic Analysis Enabled by GordonLarge-scale Genomic Analysis Enabled by Gordon
Large-scale Genomic Analysis Enabled by GordonGlenn K. Lockwood
 
Hadoop Streaming: Programming Hadoop without Java
Hadoop Streaming: Programming Hadoop without JavaHadoop Streaming: Programming Hadoop without Java
Hadoop Streaming: Programming Hadoop without JavaGlenn K. Lockwood
 

More from Glenn K. Lockwood (7)

Understanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceUnderstanding and Measuring I/O Performance
Understanding and Measuring I/O Performance
 
Parallel R and Hadoop
Parallel R and HadoopParallel R and Hadoop
Parallel R and Hadoop
 
ASCI Terascale Simulation Requirements and Deployments
ASCI Terascale Simulation Requirements and DeploymentsASCI Terascale Simulation Requirements and Deployments
ASCI Terascale Simulation Requirements and Deployments
 
Overview of Spark for HPC
Overview of Spark for HPCOverview of Spark for HPC
Overview of Spark for HPC
 
myHadoop 0.30
myHadoop 0.30myHadoop 0.30
myHadoop 0.30
 
Large-scale Genomic Analysis Enabled by Gordon
Large-scale Genomic Analysis Enabled by GordonLarge-scale Genomic Analysis Enabled by Gordon
Large-scale Genomic Analysis Enabled by Gordon
 
Hadoop Streaming: Programming Hadoop without Java
Hadoop Streaming: Programming Hadoop without JavaHadoop Streaming: Programming Hadoop without Java
Hadoop Streaming: Programming Hadoop without Java
 

Recently uploaded

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

SR-IOV: The Key Enabling Technology for Fully Virtualized HPC Clusters

  • 1. SR-IOV: The Key Enabling Technology for Fully Virtualized HPC Clusters! Glenn K. Lockwood! Christopher Irving! Philip M. Papadopoulos! Mahidhar Tatineni! Rick Wagner! SAN DIEGO SUPERCOMPUTER CENTER
  • 2. Single Root I/O Virtualization in HPC! •  Problem: complex workflows demand increasing flexibility from HPC platforms" •  Virtualization = flexibility" •  Virtualization = IO performance loss (e.g., excessive DMA interrupts)" •  Solution: SR-IOV and Mellanox ConnectX-3 InfiniBand HCAs " •  One physical function (PF) à multiple virtual functions (VF), each with own DMA streams, memory space, interrupts" •  Allows DMA to bypass hypervisor to VMs! SAN DIEGO SUPERCOMPUTER CENTER
  • 3. High-Performance Virtualization on Comet ! •  Mellanox FDR InfiniBand HCAs with SR-IOV" •  Rocks and OpenStack Nova to manage VMs" •  Flexibility to support complex science gateways and web-based workflow engines" •  custom compute appliances and virtual clusters developed with FutureGrid and their existing expertise" •  backed by virtualized Lustre running over virtualized InfiniBand" SAN DIEGO SUPERCOMPUTER CENTER
  • 4. Hardware/Software Configuration of Test Cluster ! Native, SR-IOV! Amazon EC2! Platform" •  Rocks 6.1 (EL6)" •  Virtualization via kvm   •  Amazon Linux 2013.03 (EL6)" •  cc2.8xlarge Instances" CPUs" •  2x Xeon E5-2660 (2.2GHz)" •  16 cores per node" •  2x Xeon E5-2670 (2.6GHz)" •  16 cores per node" RAM" •  64 GB DDR3 DRAM" •  60.5 DDR3 DRAM" Interconnect" •  QDR4X InfiniBand" •  10 GbE" •  Mellanox ConnectX-3 (MT27500)" •  common placement group" •  Intel VT-d, SR-IOV enabled in firmware, kernel, drivers" •  mlx4_core  1.1" •  Mellanox OFED 2.0" •  HCA firmware 2.11.1192" SAN DIEGO SUPERCOMPUTER CENTER
  • 5. 50x less latency than Amazon EC2! •  SR-IOV! •  < 30% overhead for M < 128 bytes" •  < 10% overhead for eager send/recv" •  Overhead à 0% for bandwidth-limited regime" •  Amazon EC2! •  > 5000% worse latency" •  Time dependent (noisy)" 5 SAN DIEGO SUPERCOMPUTER CENTER OSU Microbenchmarks (3.9, osu_latency)"
  • 6. 10x more bandwidth than Amazon EC2! •  SR-IOV! •  < 2% bandwidth loss over entire range" •  > 95% peak bandwidth" •  Amazon EC2! •  < 35% peak bandwidth" •  900% to 2500% worse bandwidth than virtualized InfiniBand" 6 SAN DIEGO SUPERCOMPUTER CENTER OSU Microbenchmarks (3.9, osu_bw)"
  • 7. Weather Modeling – 15% Overhead! •  96-core (6-node) calculation" •  Nearest-neighbor communication" •  Scalable algorithms" •  SR-IOV incurs modest (15%) performance hit" •  ...but still still 20% faster*** than Amazon" SAN DIEGO SUPERCOMPUTER CENTER WRF 3.4.1 – 3hr forecast" *** 20% faster despite SR-IOV cluster having 20% slower CPUs"
  • 8. Quantum ESPRESSO: 5x Faster than EC2! •  48-core, 3 node calc" •  CG matrix inversion (irregular comm.)" •  3D FFT matrix transposes (All-to-all communication)" •  28% slower w/ SR-IOV" •  SR-IOV still > 500% faster*** than EC2" SAN DIEGO SUPERCOMPUTER CENTER Quantum Espresso 5.0.2 – DEISA AUSURF112 benchmark" *** 20% faster despite SR-IOV cluster having 20% slower CPUs"
  • 9. Conclusions! •  SR-IOV: huge step forward in high-performance virtualization" •  Shows substantial improvement in latency over Amazon EC2, and it provides nearly 0 bandwidth overhead! •  Benchmark application performance confirms this: significant improvement over EC2" •  SR-IOV: lowers performance barrier to virtualizing the interconnect and makes fully virtualized HPC clusters viable! •  Comet will deliver virtualized HPC to new/non-traditional communities that need flexibility without major loss of performance! SAN DIEGO SUPERCOMPUTER CENTER