This document summarizes Cisco's UCS and usNIC technologies for high performance computing. It discusses how UCS provides record-setting servers with large memory capacities, low latency Ethernet networking, and centralized management. It then describes how usNIC allows direct userspace access to network interface cards for ultra-low latency by bypassing the operating system. Benchmarks show usNIC achieving sub-microsecond application to application latency.
UCS is Cisco’s x86 server line. It offers both blade and rack servers with a focus on manageability, virtualization, networking, and performance. It’s all designed to integrate smoothly with Cisco’s switching products. I’m really here to talk about usNIC, our low latency Ethernet solution for HPC.
N3K: 48 ports of 10GB, 12 ports 40GB, 1RU
N6K: 384 ports of 10GB, or 96 ports of 40GB, 4RU
Many innovative features in UCS since we launched in 2009.
Simplifies deployment and management by cutting out specialized networks. Saves costs by reducing the number of expensive adapters that need to be plugged into a server and reducing the number of cables and switches that need to be purchased and installed.
usNIC allows customers to finally take control of their HPC resources and save time, energy and money by empowering IT to do what only scientists and researchers have been doing with compute clusters. This technology also enables HPC On-Demand in that the same VIC which already demonstrated world-record performance in the enterprise now enables the speed HPC applications require. Customers can now provision compute at will from a single point over a single network fabric.
The trick is in VLANS and QoS, allowing you to carve that single wire into separate slices.
could poll the audience about Ethernet switch latencies
<Main point: Approximately 85% of the end to end latency in within the server, lets tackle the big ticker item>
<Click> Latency within the application depends of the application, the way it has been written and designed
<Click> The middle ware layer is a big contributor as well, often taking approximately 20uSecs
<Click> The kernel protocol processing is responsible for at least another 6uSecs
<Click> The adapter itself adds between 3-6uSec depending on the HW vendors design and implementation
<Click> Finally the network elements between 2 servers can add up to 5uSec of latency per hop
The breakdown of these latency elements show that approximately 85% of the latency, and that’s is not counting the application latency itself, is within the server. The network only contributes 15% of the total end to end application latency. At Cisco, our target is to reduce the overall latency and we are taking a holistic view in our approach.
All over *standard* Ethernet (though the VIC is required).
VT-d: Virtualization Technology for Directed I/O
IO MMU: Input / output memory management unit
SR-IOV: Single Root Input Output Virtualization
Measurements taken on E5-2690 0 @ 2.90GHz CPUs (Sandy Bridge) with Icehouse 40 GbE cards (PCIe Gen2, x16)