SlideShare une entreprise Scribd logo
1  sur  39
Introduction to 
Cisco UCS and 
Userspace NIC (usNIC) 
Argonne National Laboratory 
September 2, 2014 
Dave Goodell 
dgoodell@cisco.com 
© 2013 Cisco and/or its affiliates. All rights reserved. 1
Record-setting 
Intel Ivy Bridge 
1U and 2U servers 
(with GPU Support) 
Low 
latency Ethernet 
Up to 
1.5 TB 
RAM 
Yes, 
really! 
10 & 40 Gbps 
top-of-rack 
& Core Switching 
1.6 
usecs 
190 
nsecs 
10 & 40 Gbps! 
© 2013 Cisco and/or its affiliates. All rights reserved. 2
Performance optimized 
for any type of workload Integrated Design 
Service Profiles 
UCS Manager 
UCS Central 
Unified Fabric 
Virtualized I/O 
Form Factor 
Independence 
Low 
Latency 
Agility and reduced time 
to deploy and provision applications 
Role-based management, 
automation, ease of integration 
Centralized, multi-domain 
management, alerting and visibility 
Simplified infrastructure 
Security isolation per application, 
scale, improved performance 
Supports both blades and rack 
mount servers in a single domain 
Low Latency over Industry Standard 
Ethernet networking 
© 2013 Cisco and/or its affiliates. All rights reserved. 3
Consolidating the messaging/interconnect network 
Traditional Network 
LAN 
Ethernet FC FC 
Ethernet FC 
Unified Fabric 
LAN 
Ethernet FC 
Infiniband 
Cluster 
DCB, FCoE 
& Low Latency 
© 2013 Cisco and/or its affiliates. All rights reserved. 4
• Benefits 
• Low Latency Ethernet delivers high performance while 
retaining all the advantages of managing unified network 
fabric 
• HPC Compute Clusters can coexist with Enterprise IT 
under same management framework 
• Leverage True Hybrid Solutions From All IT Resources 
• Simplifies Procurement 
• Accelerates Deployment 
• Non Intrusive 
• Extends the Product Life Cycle / Reusability 
Lower CAPEX and OPEX 
© 2013 Cisco and/or its affiliates. All rights reserved. 5
One wire to rule them all: 
• OS Mgmt Traffic (e.g., ssh) 
• Server Hardware Mgmt 
• File System / IO Traffic 
• MPI / Application Traffic 
Cisco CIMC 
Rich XML Interface 
Unified Management 
10 & 40 Gbps Ethernet 
With QoS 
HPC Networking / 
Routing 
© 2013 Cisco and/or its affiliates. All rights reserved. 6
Host Port Switch Port 
eth0 
eth1 
eth2 
VLAN 27, MTU 1500B, Bandwidth: 100 Mbps 
VLAN 42, MTU 9000B, Bandwidth: 2Gbps 
VLAN 64, MTU 9000B, Bandwidth: Not limited 
PCIe Physical Function 
eth2 
Isolated HW 
Resource 
Virtual Functions 
RX/TX Queue Pairs 
CPU 
MPI 
Process 
SSH 
Process eth0 
© 2013 Cisco and/or its affiliates. All rights reserved. 7
Characteristics 
• Up to 20 Chassis (160 Blades) 
• 3840 CPU Cores 
• 20 Gbps Bandwidth/Blade 
• Burst Capacity up to 80 Gbps 
• Single Wire Management 
• Enterprise & HPC 
• Pod Architecture 
• Scalable 
96 or 48 
Ports 
5.3 usecs 
Any to Any 
Latency 
Up to 82.94 TeraFLOPs 
(Intel Ivy Bridge) 
© 2013 Cisco and/or its affiliates. All rights reserved. 8
3rd Party GPU 
Expansion 
C220 M3 - 1RU Dual Socket Rack Server (Up to 384 GB RAM) 
3rd Party GPU 
Expansion 
C240 M3 - 2RU Dual Socket Compute OR Storage Rack 
Server 
3rd Party GPU 
Expansion 
C420 M3 - 2RU Dual OR Quad Socket Server (Upto 1.5 TB RAM) 
© 2013 Cisco and/or its affiliates. All rights reserved. 9
Port-to-Port Latency 
190 
nsecs 
<500 
nsecs 
<500 
nsecs 
<500 
nsecs 
Nexus 3548 
48 Port x 10 Gbps 
12 x 40 Gbps 
Nexus 3172PQ 
72 Port x 10 Gbps 
6 x 40 Gbps 
Nexus 3132Q 
32 Port x 40 Gbps 
Nexus 9000 
9504 - 144 Port x 40 Gbps 
9508 - 288 Port x 40 Gbps 
9516 - 576 Port x 40 Gbps 
© 2013 Cisco and/or its affiliates. All rights reserved. 10
© 2013 Cisco and/or its affiliates. All rights reserved. 11
App to App Latency Components 
Kernel Bypass 2.02 usecs 
using SRIOV 
Kernel Overhead 
9.42 usecs 
0 2 4 6 8 10 
usNIC 
TCP/IP 
Latency (usecs) 
Middle Ware Kernel NIC Network 
HW Resource 
isolation using 
IOMMU 
TCP/IP usNIC 
Dual Functionality! 
© 2013 Cisco and/or its affiliates. All rights reserved. 12
• Direct access to NIC hardware from 
Linux userspace 
Operating System bypass 
via the Linux Verbs API (UD) 
• Utilizes Cisco Virtual Interface Card 
(VIC) for ultra-low Ethernet latency 
2nd generation 80Gbps Cisco ASIC 
2 x 10Gbps Ethernet ports, or 
2 x 40Gbps Ethernet ports 
PCI and mezzanine form factors 
• Half-round trip (HRT) ping-pong 
latencies (Intel E5-2690 v2 servers): 
Raw back to back: 1.57μs 
MPI back to back: 1.85μs 
Through MPI+N3548: 2.02μs 
These 
numbers keep 
going down 
© 2013 Cisco and/or its affiliates. All rights reserved. 13
• 2nd generation VIC: 
Can present itself 256 times on the 
PCI bus 
Has enough hardware queues / 
buffering for 256 actual NICs 
• Created for virtualization 
Designed for hypervisor bypass 
• Intent: 
Each vNIC assigned to a single 
virtual machine 
Can therefore bypass hypervisor 
“Bare metal” network performance in 
a VM 
© 2013 Cisco and/or its affiliates. All rights reserved. 14
VIC 
vNIC 
vNIC 
PCI Physical Function (PF) 
vNIC 
PCI Physical Function (PF) 
vNIC 
PCI Physical Function (PF) 
MAC address: aa:bb:cc:dd:ee:fa 
vNIC 
PCI Physical Function (PF) 
MAC address: aa:bb:cc:dd:ee:fb 
vNIC 
PCI Physical Function (PF) 
MAC address: aa:bb:cc:dd:ee:fc 
PCI Physical Function (PF) 
MAC address: aa:bb:cc:dd:ee:fd 
MAC address: aa:bb:cc:dd:ee:fe 
MAC address: aa:bb:cc:dd:ee:ff 
Physical port Physical port 
© 2013 Cisco and/or its affiliates. All rights reserved. 15
VM 
App VM 
Guest kernel 
Guest driver 
App 
Guest kernel 
Guest driver 
App 
Guest kernel 
Guest driver 
virtual switch 
Host driver 
VM 
Hypervisor 
data path 
VIC 
PCI PF 
PCI PF 
© 2013 Cisco and/or its affiliates. All rights reserved. 16
VM 
App VM 
Guest kernel 
Guest driver 
App 
Guest kernel 
Guest driver 
App 
Guest kernel 
Guest driver 
virtual switch 
Host driver 
VM 
Hypervisor 
data path 
VIC 
PCI VF 
PCI VF 
PCI PF 
© 2013 Cisco and/or its affiliates. All rights reserved. 17
VM 
App 
User process 
User space driver 
VM 
App 
User process 
User space driver 
VM App 
User process 
virtual switch 
Host driver 
Hypervisor 
data path 
VIC 
PCI VF 
PCI VF 
PCI PF 
Host OS 
Host TCP/IP 
stack 
© 2013 Cisco and/or its affiliates. All rights reserved. 18
TCP/IP usNIC 
Application 
Userspace sockets 
Userspace 
Kernel 
library 
TCP stack 
General Ethernet 
driver 
Cisco VIC driver 
Cisco VIC hardware 
Application 
Userspace verbs library 
Bootstrapping 
and setup 
Verbs IB core 
Cisco USNIC 
driver 
Send and 
receive 
fast path 
Cisco VIC hardware 
© 2013 Cisco and/or its affiliates. All rights reserved. 19
MPI 
MPI receives 
L2 frames 
directly from 
the VIC 
Userspace verbs 
library 
Cisco VIC hardware 
MPI directly 
injects L2 frames 
(with UDP/IP 
payloads) 
© 2013 Cisco and/or its affiliates. All rights reserved. 20
x86 Chipset VT-d 
I/O MMU 
VIC 
SR-IOV NIC 
MPI process 
MPI process 
Classifier 
QQPP 
Inbound 
L2 frames 
Outbound 
L2 frames 
© 2013 Cisco and/or its affiliates. All rights reserved. 21
VIC 
Physical Function (PF) Physical Function (PF) 
MAC address: aa:bb:cc:dd:ee:fe MAC address: aa:bb:cc:dd:ee:ff 
QP QP 
VF VF VF 
QP QP 
VF VF VF 
QP QP 
VF VF VF 
QP QP 
VF VF VF 
Physical port Physical port 
© 2013 Cisco and/or its affiliates. All rights reserved. 22
VIC 
PF (MAC) 
V 
F 
V 
F 
V 
F 
QP QP QP QP 
V 
F 
V 
F 
V 
F 
PF (MAC) 
V 
F 
V 
F 
V 
F 
V 
F 
V 
F 
V 
F 
MPI process 
Intel IO MMU 
MPI process Physical 
port 
Physical 
port 
© 2013 Cisco and/or its affiliates. All rights reserved. 23
• Used for physical  virtual memory translation 
• usnic verbs driver programs (and de-programs) the IOMMU 
Virtual 
Virtual VIC Intel IO MMU 
Userspace 
process 
Physical 
RAM 
Virtual 
Physical 
© 2013 Cisco and/or its affiliates. All rights reserved. 24
© 2013 Cisco and/or its affiliates. All rights reserved. 25
• Do you know what these are? 
MAC address 
IP Subnet 
ARP 
GID 
LID 
GRH 
© 2013 Cisco and/or its affiliates. All rights reserved. 26
• Manage your Ethernet network however you want 
• Manage and monitor UDP/IP traffic with standard tools 
• Can use IP routing + ECMP to create spine+leaf (Clos) networks 
• Incrementally grow deployments without rejiggering existing sub-cluster 
subnet config 
• No additional cost for IP: Cisco switches route L2/L3 at same 
speed 
© 2013 Cisco and/or its affiliates. All rights reserved. 27
• Design Principle: Behave like OS network stack as much as 
possible! 
• Examples 
Routing 
ARP 
UDP/IP port usage + visibility 
MAC in L2 frames 
• Can’t always achieve full parity 
exotic routing configurations (e.g., ip rule add blackhole …) 
tcpdump  (no OS in datapath*) 
© 2013 Cisco and/or its affiliates. All rights reserved. 28
1. call ibv_create_qp() 
2. allocates a full Linux 
UDP socket w/ port in 
OS tables 
3. pass to kmod w/ 
create_qp command 
4. bump refcount before 
installing filter, prevents 
freeing socket before 
QP destruction 
MPI 
libibverbs 
libusnic_verbs 
user 
space 
kernel usnic_verbs.ko 
shows up in lsof/netstat  
© 2013 Cisco and/or its affiliates. All rights reserved. 29
• Open MPI natively supports multi-rail 
• Open MPI automagic configuration philosophy (when possible) 
• VICs have 2 ports, can have >1 VIC per server 
• Want to avoid artificial contention 
pair local interfaces with remote interfaces 
• Remote MPI process might be on the same subnet, might not 
• Nontrivial software problem 
© 2013 Cisco and/or its affiliates. All rights reserved. 30
Example Interface Pairing 
Host A Host B 
NIC A1 
NIC A2 
NIC B1 
NIC B2 
P1 
P2 
Host A Host B 
P1 
P2 
Host A Host B 
possible connectivity 
OMPI selected pairing 
NIC A1 
NIC A2 
NIC A1 
NIC A2 
Key 
NIC B1 
NIC B2 
NIC B1 
NIC B2 
P1 
P2 
before pairing 
valid pairing 1 
valid pairing 2 
an MPI process 
© 2013 Cisco and/or its affiliates. All rights reserved. 31
Host A 
NIC A1 
NIC A2 
Host B 
NIC 
R1a 
NIC 
R2a 
Subnet S1 
NIC 
R1b 
NIC 
R2b 
NIC B1 
NIC B2 
Subnet S2 
Switch (does not need L3 capability) 
© 2013 Cisco and/or its affiliates. All rights reserved. 32
Matching Logic Must Watch For Sub-optimal Pairings 
Host A Host B 
NIC A1 
NIC A2 
NIC B1 
NIC B2 
A1 can reach B1 and B2 
A2 can only reach B1 
NIC A1 
NIC A2 
NIC B1 
NIC B2 
NIC A1 
NIC A2 
NIC B1 
NIC B2 
Case 1 (sub-optimal) 
• A2 cannot pair with 
any interface on Host 
B 
• reduces aggregate 
bandwidth 
Host A 
Host A 
Host A 
Host B 
Case 2 (desired) 
• Both Host A interfaces 
can pair with Host B 
interfaces 
© 2013 Cisco and/or its affiliates. All rights reserved. 33
© 2013 Cisco and/or its affiliates. All rights reserved. 34
1.88 μs on this SB machine 
© 2013 Cisco and/or its affiliates. All rights reserved. 35
© 2013 Cisco and/or its affiliates. All rights reserved. 36
• Everything above the 
firmware is open source 
• Open MPI 
Distributing in Cisco Open MPI 
v1.6.5 (soon to be v1.8.2) 
Upstream in Open MPI v1.7.3 and 
beyond (current stable is v1.8.1) 
• Libibverbs plugin 
• Verbs kernel module 
© 2013 Cisco and/or its affiliates. All rights reserved. 37
• 3rd Generation VIC 
2 x 40G and PCIe gen 3 
More MPI offload to hardware 
• Software update (expected this week) 
Upgrade transport from custom L2 protocol to UDP 
Key rationale point: Cisco switches L2 and L3 at same speed 
Allows switching usNIC traffic around data center 
Allows easier monitoring and policy control of usNIC traffic 
Kernel + userspace support for RHEL 7.0, SLES 12 
Open MPI optimizations for 3rd generation VIC 
© 2013 Cisco and/or its affiliates. All rights reserved. 38
Thank you.

Contenu connexe

Tendances

PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...PROIDEA
 
SR-IOV ixgbe Driver Limitations and Improvement
SR-IOV ixgbe Driver Limitations and ImprovementSR-IOV ixgbe Driver Limitations and Improvement
SR-IOV ixgbe Driver Limitations and ImprovementLF Events
 
Devconf2017 - Can VMs networking benefit from DPDK
Devconf2017 - Can VMs networking benefit from DPDKDevconf2017 - Can VMs networking benefit from DPDK
Devconf2017 - Can VMs networking benefit from DPDKMaxime Coquelin
 
Cisco nexus series
Cisco nexus seriesCisco nexus series
Cisco nexus seriesAnwesh Dixit
 
400-101 CCIE Routing and Switching IT Certification
400-101 CCIE Routing and Switching IT Certification400-101 CCIE Routing and Switching IT Certification
400-101 CCIE Routing and Switching IT Certificationwrouthae
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Michelle Holley
 
Brkarc 3454 - in-depth and personal with the cisco nexus 2000 fabric extender...
Brkarc 3454 - in-depth and personal with the cisco nexus 2000 fabric extender...Brkarc 3454 - in-depth and personal with the cisco nexus 2000 fabric extender...
Brkarc 3454 - in-depth and personal with the cisco nexus 2000 fabric extender...kds850
 
Layer-3 BFD Optimization Proposals for Enterprise and Campus Networks
Layer-3 BFD Optimization Proposals for Enterprise and Campus NetworksLayer-3 BFD Optimization Proposals for Enterprise and Campus Networks
Layer-3 BFD Optimization Proposals for Enterprise and Campus NetworksVikram G Hosakote
 
Hardware accelerated virtio networking for nfv linux con
Hardware accelerated virtio networking for nfv linux conHardware accelerated virtio networking for nfv linux con
Hardware accelerated virtio networking for nfv linux consprdd
 
Next Generation Campus Switching: Are You Ready
Next Generation Campus Switching: Are You ReadyNext Generation Campus Switching: Are You Ready
Next Generation Campus Switching: Are You ReadyCisco Canada
 
Nexus 7000 Series Innovations: M3 Module, DCI, Scale
Nexus 7000 Series Innovations: M3 Module, DCI, ScaleNexus 7000 Series Innovations: M3 Module, DCI, Scale
Nexus 7000 Series Innovations: M3 Module, DCI, ScaleTony Antony
 
Analyst Perspective - Next Generation Storage Networking for Next Generation ...
Analyst Perspective - Next Generation Storage Networking for Next Generation ...Analyst Perspective - Next Generation Storage Networking for Next Generation ...
Analyst Perspective - Next Generation Storage Networking for Next Generation ...Dennis Martin
 
Brkarc 3470 - cisco nexus 7000-7700 switch architecture (2016 las vegas) - 2 ...
Brkarc 3470 - cisco nexus 7000-7700 switch architecture (2016 las vegas) - 2 ...Brkarc 3470 - cisco nexus 7000-7700 switch architecture (2016 las vegas) - 2 ...
Brkarc 3470 - cisco nexus 7000-7700 switch architecture (2016 las vegas) - 2 ...kds850
 
Cisco 900 Series Integrated Services Routers Datasheet
Cisco 900 Series Integrated Services Routers DatasheetCisco 900 Series Integrated Services Routers Datasheet
Cisco 900 Series Integrated Services Routers Datasheet美兰 曾
 
VDC by NETWORKERS HOME
VDC by NETWORKERS HOMEVDC by NETWORKERS HOME
VDC by NETWORKERS HOMEnetworkershome
 

Tendances (20)

PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
 
SR-IOV ixgbe Driver Limitations and Improvement
SR-IOV ixgbe Driver Limitations and ImprovementSR-IOV ixgbe Driver Limitations and Improvement
SR-IOV ixgbe Driver Limitations and Improvement
 
Devconf2017 - Can VMs networking benefit from DPDK
Devconf2017 - Can VMs networking benefit from DPDKDevconf2017 - Can VMs networking benefit from DPDK
Devconf2017 - Can VMs networking benefit from DPDK
 
Cisco nexus series
Cisco nexus seriesCisco nexus series
Cisco nexus series
 
400-101 CCIE Routing and Switching IT Certification
400-101 CCIE Routing and Switching IT Certification400-101 CCIE Routing and Switching IT Certification
400-101 CCIE Routing and Switching IT Certification
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
 
Brkdct 3101
Brkdct 3101Brkdct 3101
Brkdct 3101
 
Brkarc 3601
Brkarc 3601Brkarc 3601
Brkarc 3601
 
Deploying Carrier Ethernet features on ASR 9000
Deploying Carrier Ethernet features on ASR 9000Deploying Carrier Ethernet features on ASR 9000
Deploying Carrier Ethernet features on ASR 9000
 
Brkarc 3454 - in-depth and personal with the cisco nexus 2000 fabric extender...
Brkarc 3454 - in-depth and personal with the cisco nexus 2000 fabric extender...Brkarc 3454 - in-depth and personal with the cisco nexus 2000 fabric extender...
Brkarc 3454 - in-depth and personal with the cisco nexus 2000 fabric extender...
 
Layer-3 BFD Optimization Proposals for Enterprise and Campus Networks
Layer-3 BFD Optimization Proposals for Enterprise and Campus NetworksLayer-3 BFD Optimization Proposals for Enterprise and Campus Networks
Layer-3 BFD Optimization Proposals for Enterprise and Campus Networks
 
Campus
CampusCampus
Campus
 
Hardware accelerated virtio networking for nfv linux con
Hardware accelerated virtio networking for nfv linux conHardware accelerated virtio networking for nfv linux con
Hardware accelerated virtio networking for nfv linux con
 
Next Generation Campus Switching: Are You Ready
Next Generation Campus Switching: Are You ReadyNext Generation Campus Switching: Are You Ready
Next Generation Campus Switching: Are You Ready
 
Nexus 7000 Series Innovations: M3 Module, DCI, Scale
Nexus 7000 Series Innovations: M3 Module, DCI, ScaleNexus 7000 Series Innovations: M3 Module, DCI, Scale
Nexus 7000 Series Innovations: M3 Module, DCI, Scale
 
Analyst Perspective - Next Generation Storage Networking for Next Generation ...
Analyst Perspective - Next Generation Storage Networking for Next Generation ...Analyst Perspective - Next Generation Storage Networking for Next Generation ...
Analyst Perspective - Next Generation Storage Networking for Next Generation ...
 
Brkarc 3470 - cisco nexus 7000-7700 switch architecture (2016 las vegas) - 2 ...
Brkarc 3470 - cisco nexus 7000-7700 switch architecture (2016 las vegas) - 2 ...Brkarc 3470 - cisco nexus 7000-7700 switch architecture (2016 las vegas) - 2 ...
Brkarc 3470 - cisco nexus 7000-7700 switch architecture (2016 las vegas) - 2 ...
 
WAN - trends and use cases
WAN - trends and use casesWAN - trends and use cases
WAN - trends and use cases
 
Cisco 900 Series Integrated Services Routers Datasheet
Cisco 900 Series Integrated Services Routers DatasheetCisco 900 Series Integrated Services Routers Datasheet
Cisco 900 Series Integrated Services Routers Datasheet
 
VDC by NETWORKERS HOME
VDC by NETWORKERS HOMEVDC by NETWORKERS HOME
VDC by NETWORKERS HOME
 

Similaire à Introduction to Cisco UCS and Userspace NIC (usNIC

Approaching hyperconvergedopenstack
Approaching hyperconvergedopenstackApproaching hyperconvergedopenstack
Approaching hyperconvergedopenstackIkuo Kumagai
 
 Network Innovations Driving Business Transformation
 Network Innovations Driving Business Transformation Network Innovations Driving Business Transformation
 Network Innovations Driving Business TransformationCisco Service Provider
 
Presentation cloud computing and the internet
Presentation   cloud computing and the internetPresentation   cloud computing and the internet
Presentation cloud computing and the internetxKinAnx
 
IBM System Networking Overview - Jul 2013
IBM System Networking Overview - Jul 2013IBM System Networking Overview - Jul 2013
IBM System Networking Overview - Jul 2013Angel Villar Garea
 
Deploying Applications in Today’s Compute, Storage, and Network Infrastructure
Deploying Applications in Today’s Compute, Storage, and Network InfrastructureDeploying Applications in Today’s Compute, Storage, and Network Infrastructure
Deploying Applications in Today’s Compute, Storage, and Network InfrastructureCisco Canada
 
Cisco Connect Toronto 2018 dc-aci-anywhere
Cisco Connect Toronto 2018   dc-aci-anywhereCisco Connect Toronto 2018   dc-aci-anywhere
Cisco Connect Toronto 2018 dc-aci-anywhereCisco Canada
 
Cisco Live Milan 2015 - BGP advance
Cisco Live Milan 2015 - BGP advanceCisco Live Milan 2015 - BGP advance
Cisco Live Milan 2015 - BGP advanceBertrand Duvivier
 
Introduction to Segment Routing
Introduction to Segment RoutingIntroduction to Segment Routing
Introduction to Segment RoutingMyNOG
 
The Data Center Network Evolution
The Data Center Network EvolutionThe Data Center Network Evolution
The Data Center Network EvolutionCisco Canada
 
PLNOG 13: Krzysztof Konkowski: Cisco Access Architectures: GPON, Ethernet, Ac...
PLNOG 13: Krzysztof Konkowski: Cisco Access Architectures: GPON, Ethernet, Ac...PLNOG 13: Krzysztof Konkowski: Cisco Access Architectures: GPON, Ethernet, Ac...
PLNOG 13: Krzysztof Konkowski: Cisco Access Architectures: GPON, Ethernet, Ac...PROIDEA
 
Edge Device Multi-unicasting for Video Streaming
Edge Device Multi-unicasting for Video StreamingEdge Device Multi-unicasting for Video Streaming
Edge Device Multi-unicasting for Video StreamingTal Lavian Ph.D.
 
Cisco UCS (Unified Computing System)
Cisco UCS (Unified Computing System)Cisco UCS (Unified Computing System)
Cisco UCS (Unified Computing System)NetWize
 
Design and Deployment of Enterprise WLANs
Design and Deployment of Enterprise WLANsDesign and Deployment of Enterprise WLANs
Design and Deployment of Enterprise WLANsFab Fusaro
 
Digital Media Production - Future Internet
Digital Media Production - Future InternetDigital Media Production - Future Internet
Digital Media Production - Future InternetMaarten Verwaest
 
Introduction to SDN and Network Programmability - BRKRST-1014 | 2017/Las Vegas
Introduction to SDN and Network Programmability - BRKRST-1014 | 2017/Las VegasIntroduction to SDN and Network Programmability - BRKRST-1014 | 2017/Las Vegas
Introduction to SDN and Network Programmability - BRKRST-1014 | 2017/Las VegasBruno Teixeira
 
Building DataCenter networks with VXLAN BGP-EVPN
Building DataCenter networks with VXLAN BGP-EVPNBuilding DataCenter networks with VXLAN BGP-EVPN
Building DataCenter networks with VXLAN BGP-EVPNCisco Canada
 
#IBMEdge: "Not all Networks are Equal"
#IBMEdge: "Not all Networks are Equal" #IBMEdge: "Not all Networks are Equal"
#IBMEdge: "Not all Networks are Equal" Brocade
 
Scalable midsize data center designs
Scalable midsize data center designsScalable midsize data center designs
Scalable midsize data center designsJing Bai
 

Similaire à Introduction to Cisco UCS and Userspace NIC (usNIC (20)

Approaching hyperconvergedopenstack
Approaching hyperconvergedopenstackApproaching hyperconvergedopenstack
Approaching hyperconvergedopenstack
 
 Network Innovations Driving Business Transformation
 Network Innovations Driving Business Transformation Network Innovations Driving Business Transformation
 Network Innovations Driving Business Transformation
 
Presentation cloud computing and the internet
Presentation   cloud computing and the internetPresentation   cloud computing and the internet
Presentation cloud computing and the internet
 
IBM System Networking Overview - Jul 2013
IBM System Networking Overview - Jul 2013IBM System Networking Overview - Jul 2013
IBM System Networking Overview - Jul 2013
 
Mellanox Approach to NFV & SDN
Mellanox Approach to NFV & SDNMellanox Approach to NFV & SDN
Mellanox Approach to NFV & SDN
 
Deploying Applications in Today’s Compute, Storage, and Network Infrastructure
Deploying Applications in Today’s Compute, Storage, and Network InfrastructureDeploying Applications in Today’s Compute, Storage, and Network Infrastructure
Deploying Applications in Today’s Compute, Storage, and Network Infrastructure
 
Cisco Connect Toronto 2018 dc-aci-anywhere
Cisco Connect Toronto 2018   dc-aci-anywhereCisco Connect Toronto 2018   dc-aci-anywhere
Cisco Connect Toronto 2018 dc-aci-anywhere
 
Cisco Live Milan 2015 - BGP advance
Cisco Live Milan 2015 - BGP advanceCisco Live Milan 2015 - BGP advance
Cisco Live Milan 2015 - BGP advance
 
ACI Hands-on Lab
ACI Hands-on LabACI Hands-on Lab
ACI Hands-on Lab
 
Introduction to Segment Routing
Introduction to Segment RoutingIntroduction to Segment Routing
Introduction to Segment Routing
 
The Data Center Network Evolution
The Data Center Network EvolutionThe Data Center Network Evolution
The Data Center Network Evolution
 
PLNOG 13: Krzysztof Konkowski: Cisco Access Architectures: GPON, Ethernet, Ac...
PLNOG 13: Krzysztof Konkowski: Cisco Access Architectures: GPON, Ethernet, Ac...PLNOG 13: Krzysztof Konkowski: Cisco Access Architectures: GPON, Ethernet, Ac...
PLNOG 13: Krzysztof Konkowski: Cisco Access Architectures: GPON, Ethernet, Ac...
 
Edge Device Multi-unicasting for Video Streaming
Edge Device Multi-unicasting for Video StreamingEdge Device Multi-unicasting for Video Streaming
Edge Device Multi-unicasting for Video Streaming
 
Cisco UCS (Unified Computing System)
Cisco UCS (Unified Computing System)Cisco UCS (Unified Computing System)
Cisco UCS (Unified Computing System)
 
Design and Deployment of Enterprise WLANs
Design and Deployment of Enterprise WLANsDesign and Deployment of Enterprise WLANs
Design and Deployment of Enterprise WLANs
 
Digital Media Production - Future Internet
Digital Media Production - Future InternetDigital Media Production - Future Internet
Digital Media Production - Future Internet
 
Introduction to SDN and Network Programmability - BRKRST-1014 | 2017/Las Vegas
Introduction to SDN and Network Programmability - BRKRST-1014 | 2017/Las VegasIntroduction to SDN and Network Programmability - BRKRST-1014 | 2017/Las Vegas
Introduction to SDN and Network Programmability - BRKRST-1014 | 2017/Las Vegas
 
Building DataCenter networks with VXLAN BGP-EVPN
Building DataCenter networks with VXLAN BGP-EVPNBuilding DataCenter networks with VXLAN BGP-EVPN
Building DataCenter networks with VXLAN BGP-EVPN
 
#IBMEdge: "Not all Networks are Equal"
#IBMEdge: "Not all Networks are Equal" #IBMEdge: "Not all Networks are Equal"
#IBMEdge: "Not all Networks are Equal"
 
Scalable midsize data center designs
Scalable midsize data center designsScalable midsize data center designs
Scalable midsize data center designs
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Dernier (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Introduction to Cisco UCS and Userspace NIC (usNIC

  • 1. Introduction to Cisco UCS and Userspace NIC (usNIC) Argonne National Laboratory September 2, 2014 Dave Goodell dgoodell@cisco.com © 2013 Cisco and/or its affiliates. All rights reserved. 1
  • 2. Record-setting Intel Ivy Bridge 1U and 2U servers (with GPU Support) Low latency Ethernet Up to 1.5 TB RAM Yes, really! 10 & 40 Gbps top-of-rack & Core Switching 1.6 usecs 190 nsecs 10 & 40 Gbps! © 2013 Cisco and/or its affiliates. All rights reserved. 2
  • 3. Performance optimized for any type of workload Integrated Design Service Profiles UCS Manager UCS Central Unified Fabric Virtualized I/O Form Factor Independence Low Latency Agility and reduced time to deploy and provision applications Role-based management, automation, ease of integration Centralized, multi-domain management, alerting and visibility Simplified infrastructure Security isolation per application, scale, improved performance Supports both blades and rack mount servers in a single domain Low Latency over Industry Standard Ethernet networking © 2013 Cisco and/or its affiliates. All rights reserved. 3
  • 4. Consolidating the messaging/interconnect network Traditional Network LAN Ethernet FC FC Ethernet FC Unified Fabric LAN Ethernet FC Infiniband Cluster DCB, FCoE & Low Latency © 2013 Cisco and/or its affiliates. All rights reserved. 4
  • 5. • Benefits • Low Latency Ethernet delivers high performance while retaining all the advantages of managing unified network fabric • HPC Compute Clusters can coexist with Enterprise IT under same management framework • Leverage True Hybrid Solutions From All IT Resources • Simplifies Procurement • Accelerates Deployment • Non Intrusive • Extends the Product Life Cycle / Reusability Lower CAPEX and OPEX © 2013 Cisco and/or its affiliates. All rights reserved. 5
  • 6. One wire to rule them all: • OS Mgmt Traffic (e.g., ssh) • Server Hardware Mgmt • File System / IO Traffic • MPI / Application Traffic Cisco CIMC Rich XML Interface Unified Management 10 & 40 Gbps Ethernet With QoS HPC Networking / Routing © 2013 Cisco and/or its affiliates. All rights reserved. 6
  • 7. Host Port Switch Port eth0 eth1 eth2 VLAN 27, MTU 1500B, Bandwidth: 100 Mbps VLAN 42, MTU 9000B, Bandwidth: 2Gbps VLAN 64, MTU 9000B, Bandwidth: Not limited PCIe Physical Function eth2 Isolated HW Resource Virtual Functions RX/TX Queue Pairs CPU MPI Process SSH Process eth0 © 2013 Cisco and/or its affiliates. All rights reserved. 7
  • 8. Characteristics • Up to 20 Chassis (160 Blades) • 3840 CPU Cores • 20 Gbps Bandwidth/Blade • Burst Capacity up to 80 Gbps • Single Wire Management • Enterprise & HPC • Pod Architecture • Scalable 96 or 48 Ports 5.3 usecs Any to Any Latency Up to 82.94 TeraFLOPs (Intel Ivy Bridge) © 2013 Cisco and/or its affiliates. All rights reserved. 8
  • 9. 3rd Party GPU Expansion C220 M3 - 1RU Dual Socket Rack Server (Up to 384 GB RAM) 3rd Party GPU Expansion C240 M3 - 2RU Dual Socket Compute OR Storage Rack Server 3rd Party GPU Expansion C420 M3 - 2RU Dual OR Quad Socket Server (Upto 1.5 TB RAM) © 2013 Cisco and/or its affiliates. All rights reserved. 9
  • 10. Port-to-Port Latency 190 nsecs <500 nsecs <500 nsecs <500 nsecs Nexus 3548 48 Port x 10 Gbps 12 x 40 Gbps Nexus 3172PQ 72 Port x 10 Gbps 6 x 40 Gbps Nexus 3132Q 32 Port x 40 Gbps Nexus 9000 9504 - 144 Port x 40 Gbps 9508 - 288 Port x 40 Gbps 9516 - 576 Port x 40 Gbps © 2013 Cisco and/or its affiliates. All rights reserved. 10
  • 11. © 2013 Cisco and/or its affiliates. All rights reserved. 11
  • 12. App to App Latency Components Kernel Bypass 2.02 usecs using SRIOV Kernel Overhead 9.42 usecs 0 2 4 6 8 10 usNIC TCP/IP Latency (usecs) Middle Ware Kernel NIC Network HW Resource isolation using IOMMU TCP/IP usNIC Dual Functionality! © 2013 Cisco and/or its affiliates. All rights reserved. 12
  • 13. • Direct access to NIC hardware from Linux userspace Operating System bypass via the Linux Verbs API (UD) • Utilizes Cisco Virtual Interface Card (VIC) for ultra-low Ethernet latency 2nd generation 80Gbps Cisco ASIC 2 x 10Gbps Ethernet ports, or 2 x 40Gbps Ethernet ports PCI and mezzanine form factors • Half-round trip (HRT) ping-pong latencies (Intel E5-2690 v2 servers): Raw back to back: 1.57μs MPI back to back: 1.85μs Through MPI+N3548: 2.02μs These numbers keep going down © 2013 Cisco and/or its affiliates. All rights reserved. 13
  • 14. • 2nd generation VIC: Can present itself 256 times on the PCI bus Has enough hardware queues / buffering for 256 actual NICs • Created for virtualization Designed for hypervisor bypass • Intent: Each vNIC assigned to a single virtual machine Can therefore bypass hypervisor “Bare metal” network performance in a VM © 2013 Cisco and/or its affiliates. All rights reserved. 14
  • 15. VIC vNIC vNIC PCI Physical Function (PF) vNIC PCI Physical Function (PF) vNIC PCI Physical Function (PF) MAC address: aa:bb:cc:dd:ee:fa vNIC PCI Physical Function (PF) MAC address: aa:bb:cc:dd:ee:fb vNIC PCI Physical Function (PF) MAC address: aa:bb:cc:dd:ee:fc PCI Physical Function (PF) MAC address: aa:bb:cc:dd:ee:fd MAC address: aa:bb:cc:dd:ee:fe MAC address: aa:bb:cc:dd:ee:ff Physical port Physical port © 2013 Cisco and/or its affiliates. All rights reserved. 15
  • 16. VM App VM Guest kernel Guest driver App Guest kernel Guest driver App Guest kernel Guest driver virtual switch Host driver VM Hypervisor data path VIC PCI PF PCI PF © 2013 Cisco and/or its affiliates. All rights reserved. 16
  • 17. VM App VM Guest kernel Guest driver App Guest kernel Guest driver App Guest kernel Guest driver virtual switch Host driver VM Hypervisor data path VIC PCI VF PCI VF PCI PF © 2013 Cisco and/or its affiliates. All rights reserved. 17
  • 18. VM App User process User space driver VM App User process User space driver VM App User process virtual switch Host driver Hypervisor data path VIC PCI VF PCI VF PCI PF Host OS Host TCP/IP stack © 2013 Cisco and/or its affiliates. All rights reserved. 18
  • 19. TCP/IP usNIC Application Userspace sockets Userspace Kernel library TCP stack General Ethernet driver Cisco VIC driver Cisco VIC hardware Application Userspace verbs library Bootstrapping and setup Verbs IB core Cisco USNIC driver Send and receive fast path Cisco VIC hardware © 2013 Cisco and/or its affiliates. All rights reserved. 19
  • 20. MPI MPI receives L2 frames directly from the VIC Userspace verbs library Cisco VIC hardware MPI directly injects L2 frames (with UDP/IP payloads) © 2013 Cisco and/or its affiliates. All rights reserved. 20
  • 21. x86 Chipset VT-d I/O MMU VIC SR-IOV NIC MPI process MPI process Classifier QQPP Inbound L2 frames Outbound L2 frames © 2013 Cisco and/or its affiliates. All rights reserved. 21
  • 22. VIC Physical Function (PF) Physical Function (PF) MAC address: aa:bb:cc:dd:ee:fe MAC address: aa:bb:cc:dd:ee:ff QP QP VF VF VF QP QP VF VF VF QP QP VF VF VF QP QP VF VF VF Physical port Physical port © 2013 Cisco and/or its affiliates. All rights reserved. 22
  • 23. VIC PF (MAC) V F V F V F QP QP QP QP V F V F V F PF (MAC) V F V F V F V F V F V F MPI process Intel IO MMU MPI process Physical port Physical port © 2013 Cisco and/or its affiliates. All rights reserved. 23
  • 24. • Used for physical  virtual memory translation • usnic verbs driver programs (and de-programs) the IOMMU Virtual Virtual VIC Intel IO MMU Userspace process Physical RAM Virtual Physical © 2013 Cisco and/or its affiliates. All rights reserved. 24
  • 25. © 2013 Cisco and/or its affiliates. All rights reserved. 25
  • 26. • Do you know what these are? MAC address IP Subnet ARP GID LID GRH © 2013 Cisco and/or its affiliates. All rights reserved. 26
  • 27. • Manage your Ethernet network however you want • Manage and monitor UDP/IP traffic with standard tools • Can use IP routing + ECMP to create spine+leaf (Clos) networks • Incrementally grow deployments without rejiggering existing sub-cluster subnet config • No additional cost for IP: Cisco switches route L2/L3 at same speed © 2013 Cisco and/or its affiliates. All rights reserved. 27
  • 28. • Design Principle: Behave like OS network stack as much as possible! • Examples Routing ARP UDP/IP port usage + visibility MAC in L2 frames • Can’t always achieve full parity exotic routing configurations (e.g., ip rule add blackhole …) tcpdump  (no OS in datapath*) © 2013 Cisco and/or its affiliates. All rights reserved. 28
  • 29. 1. call ibv_create_qp() 2. allocates a full Linux UDP socket w/ port in OS tables 3. pass to kmod w/ create_qp command 4. bump refcount before installing filter, prevents freeing socket before QP destruction MPI libibverbs libusnic_verbs user space kernel usnic_verbs.ko shows up in lsof/netstat  © 2013 Cisco and/or its affiliates. All rights reserved. 29
  • 30. • Open MPI natively supports multi-rail • Open MPI automagic configuration philosophy (when possible) • VICs have 2 ports, can have >1 VIC per server • Want to avoid artificial contention pair local interfaces with remote interfaces • Remote MPI process might be on the same subnet, might not • Nontrivial software problem © 2013 Cisco and/or its affiliates. All rights reserved. 30
  • 31. Example Interface Pairing Host A Host B NIC A1 NIC A2 NIC B1 NIC B2 P1 P2 Host A Host B P1 P2 Host A Host B possible connectivity OMPI selected pairing NIC A1 NIC A2 NIC A1 NIC A2 Key NIC B1 NIC B2 NIC B1 NIC B2 P1 P2 before pairing valid pairing 1 valid pairing 2 an MPI process © 2013 Cisco and/or its affiliates. All rights reserved. 31
  • 32. Host A NIC A1 NIC A2 Host B NIC R1a NIC R2a Subnet S1 NIC R1b NIC R2b NIC B1 NIC B2 Subnet S2 Switch (does not need L3 capability) © 2013 Cisco and/or its affiliates. All rights reserved. 32
  • 33. Matching Logic Must Watch For Sub-optimal Pairings Host A Host B NIC A1 NIC A2 NIC B1 NIC B2 A1 can reach B1 and B2 A2 can only reach B1 NIC A1 NIC A2 NIC B1 NIC B2 NIC A1 NIC A2 NIC B1 NIC B2 Case 1 (sub-optimal) • A2 cannot pair with any interface on Host B • reduces aggregate bandwidth Host A Host A Host A Host B Case 2 (desired) • Both Host A interfaces can pair with Host B interfaces © 2013 Cisco and/or its affiliates. All rights reserved. 33
  • 34. © 2013 Cisco and/or its affiliates. All rights reserved. 34
  • 35. 1.88 μs on this SB machine © 2013 Cisco and/or its affiliates. All rights reserved. 35
  • 36. © 2013 Cisco and/or its affiliates. All rights reserved. 36
  • 37. • Everything above the firmware is open source • Open MPI Distributing in Cisco Open MPI v1.6.5 (soon to be v1.8.2) Upstream in Open MPI v1.7.3 and beyond (current stable is v1.8.1) • Libibverbs plugin • Verbs kernel module © 2013 Cisco and/or its affiliates. All rights reserved. 37
  • 38. • 3rd Generation VIC 2 x 40G and PCIe gen 3 More MPI offload to hardware • Software update (expected this week) Upgrade transport from custom L2 protocol to UDP Key rationale point: Cisco switches L2 and L3 at same speed Allows switching usNIC traffic around data center Allows easier monitoring and policy control of usNIC traffic Kernel + userspace support for RHEL 7.0, SLES 12 Open MPI optimizations for 3rd generation VIC © 2013 Cisco and/or its affiliates. All rights reserved. 38

Notes de l'éditeur

  1. UCS is Cisco’s x86 server line. It offers both blade and rack servers with a focus on manageability, virtualization, networking, and performance. It’s all designed to integrate smoothly with Cisco’s switching products. I’m really here to talk about usNIC, our low latency Ethernet solution for HPC. N3K: 48 ports of 10GB, 12 ports 40GB, 1RU N6K: 384 ports of 10GB, or 96 ports of 40GB, 4RU
  2. Many innovative features in UCS since we launched in 2009.
  3. Simplifies deployment and management by cutting out specialized networks. Saves costs by reducing the number of expensive adapters that need to be plugged into a server and reducing the number of cables and switches that need to be purchased and installed.
  4. usNIC allows customers to finally take control of their HPC resources and save time, energy and money by empowering IT to do what only scientists and researchers have been doing with compute clusters. This technology also enables HPC On-Demand in that the same VIC which already demonstrated world-record performance in the enterprise now enables the speed HPC applications require. Customers can now provision compute at will from a single point over a single network fabric.
  5. The trick is in VLANS and QoS, allowing you to carve that single wire into separate slices.
  6. could poll the audience about Ethernet switch latencies
  7. <Main point: Approximately 85% of the end to end latency in within the server, lets tackle the big ticker item> <Click> Latency within the application depends of the application, the way it has been written and designed <Click> The middle ware layer is a big contributor as well, often taking approximately 20uSecs <Click> The kernel protocol processing is responsible for at least another 6uSecs <Click> The adapter itself adds between 3-6uSec depending on the HW vendors design and implementation <Click> Finally the network elements between 2 servers can add up to 5uSec of latency per hop The breakdown of these latency elements show that approximately 85% of the latency, and that’s is not counting the application latency itself, is within the server. The network only contributes 15% of the total end to end application latency. At Cisco, our target is to reduce the overall latency and we are taking a holistic view in our approach.
  8. All over *standard* Ethernet (though the VIC is required).
  9. VT-d: Virtualization Technology for Directed I/O IO MMU: Input / output memory management unit SR-IOV: Single Root Input Output Virtualization
  10. Measurements taken on E5-2690 0 @ 2.90GHz CPUs (Sandy Bridge) with Icehouse 40 GbE cards (PCIe Gen2, x16)