SlideShare une entreprise Scribd logo
1  sur  19
LA-UR 10-05188 Implementation & Comparison of RDMA Over Ethernet Students:	   Lee Gaiser, Brian Kraus, and James Wernicke Mentors:	   Andree Jacobson, Susan Coulter, JharrodLaFon, and Ben McClelland
Summary Background Objective Testing Environment Methodology Results Conclusion Further Work Challenges Lessons Learned Acknowledgments References & Links Questions
Background : Remote Direct Memory Access (RDMA) RDMA provides high-throughput, low-latency networking: Reduce consumption of CPU cycles Reduce communication latency Images courtesy of http://www.hpcwire.com/features/17888274.html
Background : InfiniBand Infiniband is a switched fabric communication link designed for HPC: High throughput Low latency Quality of service Failover Scalability Reliable transport How do we interface this high performance link with existing Ethernet infrastructure?
Background : RDMA over Converged Ethernet (RoCE) Provide Infiniband-like performance and efficiency to ubiquitous Ethernet infrastructure. Utilize the same transport and network layers from IB stack and swap the link layer for Ethernet. Implement IB verbs over Ethernet. Not quite IB strength, but it’s getting close. As of OFED 1.5.1, code written for OFED RDMA auto-magically works with RoCE.
Objective We would like to answer the following questions: What kind of performance can we get out of RoCE on our cluster? Can we implement RoCE in software (Soft RoCE) and how does it compare with hardware RoCE?
Testing Environment Hardware: HP ProLiant DL160 G6 servers Mellanox MNPH29B-XTC 10GbE adapters 50/125 OFNR cabling Operating System: CentOS 5.3 2.6.32.16 kernel Software/Drivers: Open Fabrics Enterprise Distribution (OFED) 1.5.2-rc2 (RoCE) & 1.5.1-rxe (Soft RoCE) OSU Micro Benchmarks (OMB) 3.1.1 OpenMPI 1.4.2
Methodology Set up a pair of nodes for each technology: IB, RoCE, Soft RoCE, and no RDMA Install, configure & run minimal services on test nodes to maximize machine performance. Directly connect nodes to maximize network performance. Acquire latency benchmarks OSU MPI Latency Test Acquire bandwidth benchmarks OSU MPI Uni-Directional Bandwidth Test OSU MPI Bi-Directional Bandwidth Test Script it all to perform many repetitions
Results : Latency
Results : Uni-directional Bandwidth
Results : Bi-directional Bandwidth
Results : Analysis ,[object Object]
Up to 5.7x speedup in latency
Up to 3.7x increase in bandwidth
IB QDR vs. RoCE:
IB less than 1µs faster than RoCE at 128-byte message.
IB peak bandwidth is 2-2.5x greater than RoCE.,[object Object]
Further Work & Questions How does RoCE perform over collectives? Can we further optimize RoCE configuration to yield better performance? Can we stabilize the Soft RoCE configuration? How much does Soft RoCE affect the compute nodes ability to perform? How does RoCE compare with iWARP?
Challenges Finding an OS that works with OFED & RDMA: Fedora 13 was too new. Ubuntu 10 wasn’t supported. CentOS 5.5 was missing some drivers. Had to compile a new kernel with IB/RoCE support. Built OpenMPI 1.4.2 from source, but wasn’t configured for RDMA; used OpenMPI 1.4.1 supplied with OFED instead. The machines communicating via Soft RoCE frequently lock up during OSU bandwidth tests.

Contenu connexe

Tendances

MinIO January 2020 Briefing
MinIO January 2020 BriefingMinIO January 2020 Briefing
MinIO January 2020 BriefingJonathan Symonds
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFShapeBlue
 
Kubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep DiveKubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep DiveMichal Rostecki
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPFRogerColl2
 
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelAccelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelThomas Graf
 
Monitoring kubernetes with prometheus
Monitoring kubernetes with prometheusMonitoring kubernetes with prometheus
Monitoring kubernetes with prometheusBrice Fernandes
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingMichelle Holley
 
YOW2021 Computing Performance
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing PerformanceBrendan Gregg
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveNetronome
 
IBTA Releases Updated Specification for RoCEv2
IBTA Releases Updated Specification for RoCEv2IBTA Releases Updated Specification for RoCEv2
IBTA Releases Updated Specification for RoCEv2inside-BigData.com
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPThomas Graf
 
Replacing iptables with eBPF in Kubernetes with Cilium
Replacing iptables with eBPF in Kubernetes with CiliumReplacing iptables with eBPF in Kubernetes with Cilium
Replacing iptables with eBPF in Kubernetes with CiliumMichal Rostecki
 
Kernel Recipes 2019 - Faster IO through io_uring
Kernel Recipes 2019 - Faster IO through io_uringKernel Recipes 2019 - Faster IO through io_uring
Kernel Recipes 2019 - Faster IO through io_uringAnne Nicolas
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortNAVER D2
 
Kvm and libvirt
Kvm and libvirtKvm and libvirt
Kvm and libvirtplarsen67
 
Cilium - Network and Application Security with BPF and XDP Thomas Graf, Cova...
Cilium - Network and Application Security with BPF and XDP  Thomas Graf, Cova...Cilium - Network and Application Security with BPF and XDP  Thomas Graf, Cova...
Cilium - Network and Application Security with BPF and XDP Thomas Graf, Cova...Docker, Inc.
 

Tendances (20)

MinIO January 2020 Briefing
MinIO January 2020 BriefingMinIO January 2020 Briefing
MinIO January 2020 Briefing
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
 
Kubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep DiveKubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep Dive
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPF
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
OVS v OVS-DPDK
OVS v OVS-DPDKOVS v OVS-DPDK
OVS v OVS-DPDK
 
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelAccelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux Kernel
 
Monitoring kubernetes with prometheus
Monitoring kubernetes with prometheusMonitoring kubernetes with prometheus
Monitoring kubernetes with prometheus
 
Dpdk performance
Dpdk performanceDpdk performance
Dpdk performance
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
 
YOW2021 Computing Performance
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing Performance
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep Dive
 
IBTA Releases Updated Specification for RoCEv2
IBTA Releases Updated Specification for RoCEv2IBTA Releases Updated Specification for RoCEv2
IBTA Releases Updated Specification for RoCEv2
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
 
Replacing iptables with eBPF in Kubernetes with Cilium
Replacing iptables with eBPF in Kubernetes with CiliumReplacing iptables with eBPF in Kubernetes with Cilium
Replacing iptables with eBPF in Kubernetes with Cilium
 
Kernel Recipes 2019 - Faster IO through io_uring
Kernel Recipes 2019 - Faster IO through io_uringKernel Recipes 2019 - Faster IO through io_uring
Kernel Recipes 2019 - Faster IO through io_uring
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-short
 
Kvm and libvirt
Kvm and libvirtKvm and libvirt
Kvm and libvirt
 
Intel dpdk Tutorial
Intel dpdk TutorialIntel dpdk Tutorial
Intel dpdk Tutorial
 
Cilium - Network and Application Security with BPF and XDP Thomas Graf, Cova...
Cilium - Network and Application Security with BPF and XDP  Thomas Graf, Cova...Cilium - Network and Application Security with BPF and XDP  Thomas Graf, Cova...
Cilium - Network and Application Security with BPF and XDP Thomas Graf, Cova...
 

Similaire à LA-UR 10-05188: RoCE Performance

International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
Bandwidth estimation for ieee 802
Bandwidth estimation for ieee 802Bandwidth estimation for ieee 802
Bandwidth estimation for ieee 802Mumbai Academisc
 
Deploying flash storage for Ceph without compromising performance
Deploying flash storage for Ceph without compromising performance Deploying flash storage for Ceph without compromising performance
Deploying flash storage for Ceph without compromising performance Ceph Community
 
Opt for modern 100Gb Broadcom 57508 NICs in your Dell PowerEdge R750 servers ...
Opt for modern 100Gb Broadcom 57508 NICs in your Dell PowerEdge R750 servers ...Opt for modern 100Gb Broadcom 57508 NICs in your Dell PowerEdge R750 servers ...
Opt for modern 100Gb Broadcom 57508 NICs in your Dell PowerEdge R750 servers ...Principled Technologies
 
Performance Evaluation of Soft RoCE over 1 Gigabit Ethernet
Performance Evaluation of Soft RoCE over 1 Gigabit EthernetPerformance Evaluation of Soft RoCE over 1 Gigabit Ethernet
Performance Evaluation of Soft RoCE over 1 Gigabit EthernetIOSR Journals
 
Network Analysis & Designing
Network Analysis & DesigningNetwork Analysis & Designing
Network Analysis & DesigningPawan Sharma
 
Linac Coherent Light Source (LCLS) Data Transfer Requirements
Linac Coherent Light Source (LCLS) Data Transfer RequirementsLinac Coherent Light Source (LCLS) Data Transfer Requirements
Linac Coherent Light Source (LCLS) Data Transfer Requirementsinside-BigData.com
 
Mellanox High Performance Networks for Ceph
Mellanox High Performance Networks for CephMellanox High Performance Networks for Ceph
Mellanox High Performance Networks for CephMellanox Technologies
 
Why 10 Gigabit Ethernet Draft v2
Why 10 Gigabit Ethernet Draft v2Why 10 Gigabit Ethernet Draft v2
Why 10 Gigabit Ethernet Draft v2Vijay Tolani
 
Madge LANswitch 3LS Application Guide
Madge LANswitch 3LS Application GuideMadge LANswitch 3LS Application Guide
Madge LANswitch 3LS Application GuideRonald Bartels
 
Multapplied Networks - Bonding and Load Balancing together in Bonded Internet™
Multapplied Networks - Bonding and Load Balancing together in Bonded Internet™Multapplied Networks - Bonding and Load Balancing together in Bonded Internet™
Multapplied Networks - Bonding and Load Balancing together in Bonded Internet™Multapplied Networks
 
Ccna interview questions
Ccna interview questions Ccna interview questions
Ccna interview questions Hub4Tech.com
 
Cooperation without synchronization practical cooperative relaying for wirele...
Cooperation without synchronization practical cooperative relaying for wirele...Cooperation without synchronization practical cooperative relaying for wirele...
Cooperation without synchronization practical cooperative relaying for wirele...ieeeprojectschennai
 
Computer Network Performance evaluation based on Network scalability using OM...
Computer Network Performance evaluation based on Network scalability using OM...Computer Network Performance evaluation based on Network scalability using OM...
Computer Network Performance evaluation based on Network scalability using OM...Jaipal Dhobale
 
Networking & Servers
Networking & ServersNetworking & Servers
Networking & ServersBecky Holden
 
Turbocharge the NFV Data Plane in the SDN Era - a Radisys presentation
Turbocharge the NFV Data Plane in the SDN Era - a Radisys presentationTurbocharge the NFV Data Plane in the SDN Era - a Radisys presentation
Turbocharge the NFV Data Plane in the SDN Era - a Radisys presentationRadisys Corporation
 
Top 20 ccna interview questions and answers pdf
Top 20 ccna interview questions and answers pdfTop 20 ccna interview questions and answers pdf
Top 20 ccna interview questions and answers pdfHub4Tech.com
 

Similaire à LA-UR 10-05188: RoCE Performance (20)

International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Bandwidth estimation for ieee 802
Bandwidth estimation for ieee 802Bandwidth estimation for ieee 802
Bandwidth estimation for ieee 802
 
Deploying flash storage for Ceph without compromising performance
Deploying flash storage for Ceph without compromising performance Deploying flash storage for Ceph without compromising performance
Deploying flash storage for Ceph without compromising performance
 
Link_NwkingforDevOps
Link_NwkingforDevOpsLink_NwkingforDevOps
Link_NwkingforDevOps
 
Opt for modern 100Gb Broadcom 57508 NICs in your Dell PowerEdge R750 servers ...
Opt for modern 100Gb Broadcom 57508 NICs in your Dell PowerEdge R750 servers ...Opt for modern 100Gb Broadcom 57508 NICs in your Dell PowerEdge R750 servers ...
Opt for modern 100Gb Broadcom 57508 NICs in your Dell PowerEdge R750 servers ...
 
Performance Evaluation of Soft RoCE over 1 Gigabit Ethernet
Performance Evaluation of Soft RoCE over 1 Gigabit EthernetPerformance Evaluation of Soft RoCE over 1 Gigabit Ethernet
Performance Evaluation of Soft RoCE over 1 Gigabit Ethernet
 
Network Analysis & Designing
Network Analysis & DesigningNetwork Analysis & Designing
Network Analysis & Designing
 
Linac Coherent Light Source (LCLS) Data Transfer Requirements
Linac Coherent Light Source (LCLS) Data Transfer RequirementsLinac Coherent Light Source (LCLS) Data Transfer Requirements
Linac Coherent Light Source (LCLS) Data Transfer Requirements
 
Linkage aggregation
Linkage aggregationLinkage aggregation
Linkage aggregation
 
Mellanox High Performance Networks for Ceph
Mellanox High Performance Networks for CephMellanox High Performance Networks for Ceph
Mellanox High Performance Networks for Ceph
 
Why 10 Gigabit Ethernet Draft v2
Why 10 Gigabit Ethernet Draft v2Why 10 Gigabit Ethernet Draft v2
Why 10 Gigabit Ethernet Draft v2
 
Madge LANswitch 3LS Application Guide
Madge LANswitch 3LS Application GuideMadge LANswitch 3LS Application Guide
Madge LANswitch 3LS Application Guide
 
Multapplied Networks - Bonding and Load Balancing together in Bonded Internet™
Multapplied Networks - Bonding and Load Balancing together in Bonded Internet™Multapplied Networks - Bonding and Load Balancing together in Bonded Internet™
Multapplied Networks - Bonding and Load Balancing together in Bonded Internet™
 
Ccna interview questions
Ccna interview questions Ccna interview questions
Ccna interview questions
 
Cooperation without synchronization practical cooperative relaying for wirele...
Cooperation without synchronization practical cooperative relaying for wirele...Cooperation without synchronization practical cooperative relaying for wirele...
Cooperation without synchronization practical cooperative relaying for wirele...
 
Computer Network Performance evaluation based on Network scalability using OM...
Computer Network Performance evaluation based on Network scalability using OM...Computer Network Performance evaluation based on Network scalability using OM...
Computer Network Performance evaluation based on Network scalability using OM...
 
Super Computer
Super ComputerSuper Computer
Super Computer
 
Networking & Servers
Networking & ServersNetworking & Servers
Networking & Servers
 
Turbocharge the NFV Data Plane in the SDN Era - a Radisys presentation
Turbocharge the NFV Data Plane in the SDN Era - a Radisys presentationTurbocharge the NFV Data Plane in the SDN Era - a Radisys presentation
Turbocharge the NFV Data Plane in the SDN Era - a Radisys presentation
 
Top 20 ccna interview questions and answers pdf
Top 20 ccna interview questions and answers pdfTop 20 ccna interview questions and answers pdf
Top 20 ccna interview questions and answers pdf
 

LA-UR 10-05188: RoCE Performance

  • 1. LA-UR 10-05188 Implementation & Comparison of RDMA Over Ethernet Students: Lee Gaiser, Brian Kraus, and James Wernicke Mentors: Andree Jacobson, Susan Coulter, JharrodLaFon, and Ben McClelland
  • 2. Summary Background Objective Testing Environment Methodology Results Conclusion Further Work Challenges Lessons Learned Acknowledgments References & Links Questions
  • 3. Background : Remote Direct Memory Access (RDMA) RDMA provides high-throughput, low-latency networking: Reduce consumption of CPU cycles Reduce communication latency Images courtesy of http://www.hpcwire.com/features/17888274.html
  • 4. Background : InfiniBand Infiniband is a switched fabric communication link designed for HPC: High throughput Low latency Quality of service Failover Scalability Reliable transport How do we interface this high performance link with existing Ethernet infrastructure?
  • 5. Background : RDMA over Converged Ethernet (RoCE) Provide Infiniband-like performance and efficiency to ubiquitous Ethernet infrastructure. Utilize the same transport and network layers from IB stack and swap the link layer for Ethernet. Implement IB verbs over Ethernet. Not quite IB strength, but it’s getting close. As of OFED 1.5.1, code written for OFED RDMA auto-magically works with RoCE.
  • 6. Objective We would like to answer the following questions: What kind of performance can we get out of RoCE on our cluster? Can we implement RoCE in software (Soft RoCE) and how does it compare with hardware RoCE?
  • 7. Testing Environment Hardware: HP ProLiant DL160 G6 servers Mellanox MNPH29B-XTC 10GbE adapters 50/125 OFNR cabling Operating System: CentOS 5.3 2.6.32.16 kernel Software/Drivers: Open Fabrics Enterprise Distribution (OFED) 1.5.2-rc2 (RoCE) & 1.5.1-rxe (Soft RoCE) OSU Micro Benchmarks (OMB) 3.1.1 OpenMPI 1.4.2
  • 8. Methodology Set up a pair of nodes for each technology: IB, RoCE, Soft RoCE, and no RDMA Install, configure & run minimal services on test nodes to maximize machine performance. Directly connect nodes to maximize network performance. Acquire latency benchmarks OSU MPI Latency Test Acquire bandwidth benchmarks OSU MPI Uni-Directional Bandwidth Test OSU MPI Bi-Directional Bandwidth Test Script it all to perform many repetitions
  • 12.
  • 13. Up to 5.7x speedup in latency
  • 14. Up to 3.7x increase in bandwidth
  • 15. IB QDR vs. RoCE:
  • 16. IB less than 1µs faster than RoCE at 128-byte message.
  • 17.
  • 18. Further Work & Questions How does RoCE perform over collectives? Can we further optimize RoCE configuration to yield better performance? Can we stabilize the Soft RoCE configuration? How much does Soft RoCE affect the compute nodes ability to perform? How does RoCE compare with iWARP?
  • 19. Challenges Finding an OS that works with OFED & RDMA: Fedora 13 was too new. Ubuntu 10 wasn’t supported. CentOS 5.5 was missing some drivers. Had to compile a new kernel with IB/RoCE support. Built OpenMPI 1.4.2 from source, but wasn’t configured for RDMA; used OpenMPI 1.4.1 supplied with OFED instead. The machines communicating via Soft RoCE frequently lock up during OSU bandwidth tests.
  • 20. Lessons Learned Installing and configuring HPC clusters Building, installing, and fixing Linux kernel, modules, and drivers Working with IB, 10GbE, and RDMA technologies Using tools such as OMB-3.1.1 and netperf for benchmarking performance
  • 21. Acknowledgments Andree Jacobson Susan Coulter JharrodLaFon Ben McClelland Sam Gutierrez Bob Pearson
  • 23. References & Links Submaroni, H. et al. RDMA over Ethernet – A Preliminary Study. OSU. http://nowlab.cse.ohio-state.edu/publications/conf-presentations/2009/subramoni-hpidc09.pdf Feldman, M. RoCE: An Ethernet-InfiniBand Love Story. HPCWire.com. April 22, 2010.http://www.hpcwire.com/blogs/RoCE-An-Ethernet-InfiniBand-Love-Story-91866499.html Woodruff, R. Access to InfiniBand from Linux.Intel. October 29, 2009. http://software.intel.com/en-us/articles/access-to-infiniband-from-linux/ OFED 1.5.2-rc2http://www.openfabrics.org/downloads/OFED/ofed-1.5.2/OFED-1.5.2-rc2.tgz OFED 1.5.1-rxehttp://www.systemfabricworks.com/pub/OFED-1.5.1-rxe.tgz OMB 3.1.1http://mvapich.cse.ohio-state.edu/benchmarks/OMB-3.1.1.tgz

Notes de l'éditeur

  1. Introduce yourself, the institute, your teammates, and what you’ve been working on for the past two months.
  2. Rephrase these bullet points with a little more elaboration.
  3. Emphasize how RDMA eliminates unnecessary communication.
  4. Explain that we are using IB QDR and what that means.
  5. Emphasize that the biggest advantage of RoCE is latency, not necessarily bandwidth. Talk about 40Gb & 100Gb Ethernet on the horizon.
  6. OSU benchmarks were more appropriate than netperf
  7. The highlight here is that latency between IB & RoCE differs by only 1.7us at 128 byte messages. It continues to be very close up through 4K messages. Also notice that latency in RoCE and no RDMA converge at higher messages.
  8. Note that RoCE & IB are very close up to 1K message size. IB QDR peaks out at 3MB/s, RoCE at 1.2MB/s
  9. Note that the bandwidth trends are similar to the uni-directional bandwidths. Explain that Soft RoCE could not complete this test. IB QDR peaks at 5.5 MB/s and RoCE peaks at 2.3 MB/s.