Contenu connexe Similaire à Mellanox IBM (20) Mellanox IBM1. 10/4/2013
Mellanox Overview
Ticker: MLNX
Leading provider of high-throughput, low-latency server and storage interconnect
• FDR 56Gb/s InfiniBand and 10/40/56GbE
• Reduces application wait-time for data
• Dramatically increases ROI on data center infrastructure
Company headquarters:
• Yokneam, Israel; Sunnyvale, California
• ~1,200 employees* worldwide
Solid financial position
•
•
•
•
Mellanox Technology Update
Mellanox Confidential – Phil Hofstad
Record revenue in FY12; $500.8M, up 93% year-over-year
Q2’13 revenue of $98.2M
Q3’13 guidance ~$104M to $109M
Cash + investments @ 6/30/13 = $411.3M
* As of June 2013
Sept 2013
© 2013 Mellanox Technologies
- Mellanox Confidential -
Virtual Protocol Interconnect (VPI) Technology
Providing End-to-End Interconnect Solutions
Comprehensive End-to-End Software Accelerators and Managment
Management
VPI Adapter
VPI Switch
Unified Fabric Manager
Storage and Data
Switch OS Layer
Applications
MXM
FCA
Mellanox Messaging
Acceleration
Fabric Collectives
Acceleration
UFM
Unified Fabric Management
VSA
UDA
Storage Accelerator
(iSCSI)
Unstructured Data
Accelerator
Networking
Storage
Clustering
Management
64 ports 10GbE
36 ports 40/56GbE
48 10GbE + 12 40/56GbE
36 ports IB up to 56Gb/s
8 VPI subnets
Acceleration Engines
Comprehensive End-to-End InfiniBand and Ethernet Solutions Portfolio
ICs
Adapter Cards
Switches/Gateways
2
Long-Haul Systems
Ethernet: 10/40/56 Gb/s
3.0
Cables/Modules
InfiniBand:10/20/40/56 Gb/s
From data center to
campus and metro
connectivity
LOM
© 2013 Mellanox Technologies
- Mellanox Confidential -
3
© 2013 Mellanox Technologies
Adapter Card
Mezzanine Card
- Mellanox Confidential -
4
1
2. 10/4/2013
MetroDX™ and MetroX™
Data Center Expansion Example – Disaster Recovery
MetroX™ and MetroDX™ extends InfiniBand and Ethernet RDMA reach
Fastest interconnect over 40Gb/s InfiniBand or Ethernet links
Supporting multiple distances
Simple management to control distant sites
Low-cost, low-power , long-haul solution
40Gb/s over Campus and Metro
© 2013 Mellanox Technologies
- Mellanox Confidential -
5
Key Elements in a Data Center Interconnect
© 2013 Mellanox Technologies
- Mellanox Confidential -
6
IPtronics and Kotura Complete Mellanox’s 100Gb/s+ Technology
Applications
Recent Acquisitions of Kotura and IPtronics Enable Mellanox to Deliver
Complete High-Speed Optical Interconnect Solutions for 100Gb/s and Beyond
Servers
Storage
Switch and IC
Cables, Silicon Photonics, Parallel Optical Modules
Adapter and IC
© 2013 Mellanox Technologies
Adapter and IC
- Mellanox Confidential -
7
© 2013 Mellanox Technologies
- Mellanox Confidential -
8
2
3. 10/4/2013
Mellanox InfiniBand Paves the Road to Exascale
Leading Interconnect, Leading Performance
200Gb/s
Bandwidth
100Gb/s
56Gb/s
40Gb/s
20Gb/s
10Gb/s
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2013
2014
2015
2016
2017
Same Software Interface
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
5usec
2.5usec
1.3usec
0.7usec
0.6usec
Latency
© 2013 Mellanox Technologies
- Mellanox Confidential -
9
<0.5usec
© 2013 Mellanox Technologies
- Mellanox Confidential -
10
Connect-IB : The Exascale Foundation
World’s first 100Gb/s interconnect adapter
• PCIe 3.0 x16, dual FDR 56Gb/s InfiniBand ports to provide >100Gb/s
Highest InfiniBand message rate: 137 million messages per second
• 4X higher than other InfiniBand solutions
<0.7 micro-second application latency
Connect-IB
Supports GPUDirect RDMA for direct GPU-to-GPU communication
Architectural Foundation for Exascale Computing
Unmatchable Storage Performance
• 8,000,000 IOPs (1QP), 18,500,000 IOPs (32 QPs)
New Innovative Transport – Dynamically Connected Transport Service
Supports Scalable HPC with MPI, SHMEM and PGAS/UPC offloads
Enter the World of Boundless Performance
© 2013 Mellanox Technologies
- Mellanox Confidential -
11
© 2013 Mellanox Technologies
- Mellanox Confidential -
12
3
4. 10/4/2013
Dynamically Connected Transport Advantages
© 2013 Mellanox Technologies
- Mellanox Confidential -
14
FDR InfiniBand Delivers Highest Application Performance
Scalable and Cost Effective Topologies
3D Torus, Fat-Tree, Dragonfly+
Message Rate (Million)
Message Rate
150
100
50
0
QDR InfiniBand
© 2013 Mellanox Technologies
FDR InfiniBand
- Mellanox Confidential -
15
© 2013 Mellanox Technologies
- Mellanox Confidential -
16
4
5. 10/4/2013
Topologies / Capabilities Overview
l
l
1
2
3
m
b
1
h
Can be fully rearrangeably non-blocking (1:1) or
blocking (x:1). Typically enables best
performance, lowest latency
Alleviates bandwidth bottleneck closer to the
root. Most common topology in many
supercomputer installations
Blocking network, cost-effective for systems at scale
Great performance solutions for applications with locality
Support for dedicate sub-networks
Simple expansion for future growth
Fault tolerant with multiple link failures
Support for adaptive routing, congestion control, QOS
Not limited to storage connection only at cube edges
r
h
HCA
1..h
HCA
1..h
Concept of connecting “groups” or “virtual routers” together in a
full-graph (all to all) way
Support higher dimensional networks
Flexible definition of intra-group interconnection
Dragonfly+ has higher scalability and more cost effective
No intermediate proprietary storage conversion
Unified Fabric Management
UFM
Progression of Advanced Capabilities…
UpDn Routing
Based on minimum hops to each node
Nodes ranked based on distance from root nodes
Restricted routes by strict ranking rules
Routing Chains
Multiple systems can be routed with unique associated
topologies. Provides for more complex routing
Support for Port-groups policies. A new policy file lets
the user define “sub topologies”
© 2013 Mellanox Technologies
- Mellanox Confidential -
IB Routers
Scalability beyond LID space
Administrative subnet isolation
Fault isolation. IB routing between subnets
17
UFM in the Fabric
© 2013 Mellanox Technologies
- Mellanox Confidential -
18
Multi-Site Management
Synchronization, Heartbeat
Software or Appliance form factor
2 or more High Availability
Switch and HCA management
Full Mgmt or Monitoring Only modes
Site1
© 2013 Mellanox Technologies
- Mellanox Confidential -
19
© 2013 Mellanox Technologies
Site2
...
- Mellanox Confidential -
Site n
20
5
6. 10/4/2013
UFM Main Features
Integration with 3rd Party Systems
Extensible architecture
Automatic Discovery
Central Device Mgmt
Fabric Dashboard
Congestion Analysis
Health & Perf Monitoring
Advanced Alerting
Fabric Health Reports
Service Oriented Provisioning
• Based on Web-services
Alerts via
SNMP Traps
Open API for users or 3rd-party
extensions
• Allows simple
reporting, provisioning, and monitoring
• Task automation
• Software Development Kit
Monitoring System
Web Based
API
Configuration Mgmt
Extensible object model
• User-defined fields
• User-defined menus
Web Based API
Read
Write
Manage
Orchestrator
Job Scheduler…
© 2013 Mellanox Technologies
- Mellanox Confidential -
21
© 2013 Mellanox Technologies
- Mellanox Confidential -
22
Mellanox ScalableHPC Accelerate Parallel Applications
MPI
SHMEM
P1
P2
P3
Memory
Memory
Memory
PGAS
P1
P2
P3
P1
P2
P3
Scalable Communication
Logical Shared Memory
MXM, FCA
Memory
Logical Shared Memory
Memory
MXM
•
•
•
•
Memory
FCA
•
•
•
•
Memory
Reliable Messaging Optimized for Mellanox HCA
Hybrid Transport Mechanism
Efficient Memory Registration
Receive Side Tag Matching
Topology Aware Collective Optimization
Hardware Multicast
Separate Virtual Fabric for Collectives
CORE-Direct Hardware Offload
InfiniBand Verbs API
© 2013 Mellanox Technologies
- Mellanox Confidential -
23
© 2013 Mellanox Technologies
- Mellanox Confidential -
24
6
7. 10/4/2013
MXM v2.0 - Highlights
Mellanox FCA Collective Scalability
Reduce Collective
Barrier Collective
Latency (us)
More solutions will be added in the future
Utilizing Mellanox offload engines
Supported APIs (both sync/async): AM, p2p, atomics, synchronization
3000
100,0
90,0
80,0
70,0
60,0
50,0
40,0
30,0
20,0
10,0
0,0
2500
Latency (us)
Transport library integrated with OpenMPI, OpenSHMEM, BUPC, Mvapich2
2000
1500
1000
500
0
0
500
1000
1500
2000
2500
0
500
1000
Without FCA
Supported transports: RC, UD, DC, RoCE, SHMEM
1500
2000
2500
processes (PPN=8)
Processes (PPN=8)
With FCA
Without FCA
With FCA
8-Byte Broadcast
Bandwidth (KB*processes)
Supported built-in mechanisms: tag matching, progress thread, memory
registration cache, fast path send for small messages, zero copy, flow control
Supported data transfer protocols: Eager Send/Recv, Eager RDMA, Rendezvous
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
0
500
1000
1500
2000
2500
Processes (PPN=8)
Without FCA
© 2013 Mellanox Technologies
- Mellanox Confidential -
25
© 2013 Mellanox Technologies
With FCA
- Mellanox Confidential -
26
GPUDirect RDMA
Transmit
1
GPU
GPU Direct
CPU
GPUDirect 1.0
Chip
set
Chip
set
InfiniBand
CPU
1
Receive
System
Memory
GPU
InfiniBand
GPU
Memory
1
CPU
CPU
1
GPU
Memory
System
Memory
GPU
Chip
set
Chip
set
GPU
Memory
- Mellanox Confidential -
27
System
Memory
GPU
InfiniBand
© 2013 Mellanox Technologies
System
Memory
© 2013 Mellanox Technologies
InfiniBand
GPUDirect RDMA
- Mellanox Confidential -
GPU
Memory
28
7
8. 10/4/2013
Preliminary Performance of MVAPICH2 with GPU-Direct-RDMA
Preliminary Performance of MVAPICH2 with GPUDirect RDMA
GPU-GPU Internode MPI Bandwidth
Small Message Latency
35
MVAPICH2-1.9
MVAPICH2-1.9
800
MVAPICH2-1.9-GDR
30
MVAPICH2-1.9-GDR
Lower is Better
19.78
20
15
69 %
10
Bandwidth (MB/s)
700
25
Latency (us)
Small Message Bandwidth
900
Higher is Better
GPU-GPU Internode MPI Latency
600
500
3x
400
300
100
6.12
0
1
4
16
64
256
1K
0
4K
1
Message Size (bytes)
4
16
64
256
Message Size (bytes)
1K
4K
Source: Prof. DK Panda
69% Lower Latency
Source: Prof. DK Panda
3X Increase in Throughput
© 2013 Mellanox Technologies
- Mellanox Confidential -
29
Modular Switches
SX6015 – 18 port externally managed
NEW
216port
108 port
- Mellanox Confidential -
30
SX1036
Edge Switches
SX6025 – 36 ports externally managed
324 port
© 2013 Mellanox Technologies
Top-of-Rack (TOR) Ethernet Switch Portfolio
FDR InfiniBand Switch Portfolio
648 port
(Heisenberg Spin Glass)
Application with 2 GPU Nodes
200
5
Execution Time of HSG
SX6005 – 12 port externally managed
36x 40GbE non blocking TOR / Aggregation
SX1024
SX6036 – 36 ports managed
48x10GbE+12x40GbE, 1.92Tbps
non-blocking 10GbE 40GbE Aggregation
SX6018 – 18 port managed
SX1016
64x10GbE, 1.28Tbps non blocking
NEW
SX6012 – 12 port managed
SX1012
Long Distance
Bridge − VPI
12x40GbE non blocking, 1U half-width
Management
Highest Capacity in 1U
NEW
Bridging
Routing
NEW
Unique Value Proposition
• 12x 40GbE to 36x 40GbE
• 64x 10GbE
• 56GbE gives 40% better bandwidth
• VPI to connect to Infiniband
Latency
• 220ns L2 latency
• 330ns L3 latency
© 2013 Mellanox Technologies
- Mellanox Confidential -
31
© 2013 Mellanox Technologies
Power (SX1036)
• Under 1W per 10GbE interface
• 2.3W per 40GbE interface
- Mellanox Confidential -
32
8
9. 10/4/2013
The Generation of Open Ethernet (Open Switch Platform)
ConnectX-3 Pro
Dual port VPI Adapter
Centralized SDN Controller
Open Source Applications
Routing L3
TE
Route Flow
Security
Manager
Power
Manager
Low power
Applications
OpenFlow Controller
• Typ 2-port 10GigE – 2.9W
LOM/mezz features
• SMBus / NCSI - Baseboard Mgmt Controller I/F , WoL
• Integrated sensors
QUAGGA
Performance Improvements
Open
• Larger context caches
Switching L2
OpenFlow
Routing L3
• Higher message rate
Management
Virtualization
Ethernet
• NVGRE/VXLAN Stateless Offloads
Linux
• SRIOV
10/20/40/56GbE
Roadmap to 100GbE
End-2-End Solution
(NICs, Switches, Cables, Software)
Small Footprint, minimal peripherals
• FCBGA 17x17 (small footprint)
• 1.0mm standard ball pitch
© 2013 Mellanox Technologies
- Mellanox Confidential -
33
© 2013 Mellanox Technologies
- Mellanox Confidential -
34
Thank You
Phil Hofstad
philip@mellanox.com
Tel: +44 7597 566281
© 2013 Mellanox Technologies
- Mellanox Confidential -
35
9