Dell and NVIDIA for Your AI workloads in the Data Center

Helge Gose, NVIDIA Solution Architect, June 7, 2018
DELL AND NVIDIA FOR YOUR AI
WORKLOADS IN THE DATA CENTER

2
AGENDA
What is Deep Learning?
Volta and NVLINK
Inference to Training – Dell solutions

3
THE TIME HAS COME FOR GPU COMPUTING
1980 1990 2000 2010 2020
103
105
107
1.5X per year
1.1X per year
Single-threaded perf
GPU-Accelerated
Computing

4
DEEP LEARNING
IS SWEEPING ACROSS INDUSTRIES
INTERNET SERVICES
MEDICINE MEDIA & ENTERTAINMENT SECURITY & DEFENSE AUTONOMOUS MACHINES
Cancer cell detection
Diabetic grading
Drug discovery
Pedestrian detection
Lane tracking
Recognize traffic signs
Face recognition
Video surveillance
Cyber security
Video captioning
Content based search
Real time translation
Image/Video classification
Speech recognition
Natural language processing
INTERNET SERVICES

6
A NEW COMPUTING MODEL
Algorithms that learn from examples
TRADITIONAL APPROACH
Requires domain experts
Time-consuming experimentation
Custom algorithms
Not scalable to new problems
DEEP NEURAL NETWORKS
Learn from data
Easily to extend
Accelerated with GPUs
MACHINE LEARNING
Car
Vehicle
Coupe
Car
Vehicle
Coupe
DEEP LEARNING

7
WHAT PROBLEM ARE YOU SOLVING?
Defining the AI/DL Task
BUSINESS
QUESTION
AI/DL TASK
EXAMPLE OUTPUTS
HEALTHCARE RETAIL FINANCE
Is “it” present
or not?
Detection Cancer Detection Targeted ads Cybersecurity
What type of thing
is “it”?
Classification Image Classification Basket Analysis Credit Scoring
To what extent is
“it” present?
Segmentation
Tumor Size/Shape
Analysis
Build 360º
Customer View
Credit Risk Analysis
What is the likely
outcome?
Prediction
Survivability
Prediction
Sentiment &
behavior recognition
Fraud Detection
What will likely
satisfy the objective?
Recommendations
Therapy
Recommendation
Recommendation
Engine
Algorithmic
Trading
INPUTS
Text Data Images
AudioVideo

9
TESLA V100
WORLD’S MOST ADVANCED DATA CENTER GPU
5,120 CUDA cores
640 NEW Tensor cores
7.8 FP64 TFLOPS | 15.7 FP32 TFLOPS | 125 Tensor TFLOPS
20MB SM RF | 16MB Cache
16GB/ 32GB HBM2 @ 900GB/s | 300GB/s NVLink

10
REVOLUTIONARY AI PERFORMANCE
3X Faster DL Training Performance
3X Reduction in Time to Train Over P100
0 10 20
1X
V100
1X
P100
2X
CPU
Relative Time to Train Improvements
(LSTM)
Neural Machine Translation Training for 13 Epochs |German ->English, WMT15 subset | CPU = 2x
Xeon E5 2699 V4
15 Days
18 Hours
6 Hours
Over 80X DL Training
Performance in 3 Years
1x K80
cuDNN2
4x M40
cuDNN3
8x P100
cuDNN6
8x V100
cuDNN7
0x
20x
40x
60x
80x
100x
Q1
15
Q3
15
Q2
17
Q2
16
Exponential Performance over time
(GoogleNet)
SpeedupvsK80
GoogleNet Training Performance on versions of cuDNN
Vs 1x K80 cuDNN2

11
END-TO-END PRODUCT FAMILY
TRAINING INFERENCE
Jetson
Drive PX
Dell PowerEdge
C4140
DATA CENTER
TITAN V
TESLA V100
DESKTOP
DGX Station
DATA CENTER
TESLA V100
TESLA P4
EMBEDDED AUTOMOTIVE
DriveWorks SDKJETPACK SDK

12
POWERING THE DEEP LEARNING ECOSYSTEM
NVIDIA SDK Accelerates Every Major Framework
COMPUTER VISION
OBJECT DETECTION IMAGE CLASSIFICATION
SPEECH & AUDIO
VOICE RECOGNITION LANGUAGE TRANSLATION
NATURAL LANGUAGE PROCESSING
RECOMMENDATION ENGINES SENTIMENT ANALYSIS
DEEP LEARNING FRAMEWORKS
Mocha.jl
NVIDIA DEEP LEARNING SDK
developer.nvidia.com/deep-learning-software

1414 of 21
PowerEdge C4140 Server
Faster time to insights with ultra-dense accelerator optimized server platform
* Based on Dell internal analyses and Principled Technologies Report - Jan 2015.
TARGETED W ORKLOADS
THE BEDROCK OF THE MODERN DATACENTER
• Machine Learning and Deep
Learning
• Technical Computing (Research /
Life Sciences)
• Low latency, high performance
applications (FSI)
Key Capabilities
• Unthrottled performance and superior thermal efficiency with patent-pending interleaved
GPU system design*
• No-compromise (CPU + GPU) acceleration technology up to 500 TFLOPS / U+
using the
NVIDIA®
Tesla™
V100 with NVLink™
• 2.4KW PSUs help future-proof for next generation GPUs
• Simplified deployment with pre-configured Ready Bundles
Xeon Scalable
Processors
Tesla
GPUs
+
Based on V100 NVLink Tensor Core Performance

1515 of 21
C4140 – Now with NVIDIA®
Volta™
and NVLink™
Faster time to insights with ultra-dense accelerator optimized server platform
THE BEDROCK OF THE MODERN DATACENTER
NVIDIA®
Volta GPU has over
21 Billion Transistors and
640 Tensor cores to deliver
100+ TFLOPS
NVIDIA®
NVLink™
is a high-
bandwidth interconnect
enabling ultra fast
communication between
CPU and GPU and between
GPUs
 Volta V100 performs 2.6X avg. speed up for DL workloads than Pascal P100
 Delivers 44X more throughput compared to CPU nodes with lower latency
 NVLink 5X – 10X faster than traditional PCIe Gen3 Interconnect
 Volta-Optimized Software for important HPC applications
*Source: NVIDIA® Volta benchmarks for multiple applications 2017

1616 of 21 THE BEDROCK OF THE MODERN DATACENTER
C4140 and NVLink™
NVLink Topology
 NVLINK is 25Gbps versus PCIe at 8Gbps
 Increase in performance due to higher clock speed – 7%
 Increase in performance Peer to Peer GPU communication – 7%+
PCIe Topology

Dell - Internal Use - Confidential18
Towers ModularRacks
Extreme Scale
Infrastructure
INDUSTRY'S #1
Server Portfolio
PowerEdge
THE BEDROCK OF THE MODERN DATA CENTER
*Based on units sold (tie). IDC Worldwide Quarterly Server Tracker, Q1-Q3, 2016.
18
OpenManage Enterprise – Intelligent Automation Systems Management
Now Introducing C4140

Dell - Internal Use - Confidential19 Dell - Internal Use - Confidential
ACCELERATE YOUR BUSINESS ON
PowerEdge
ADAPT AND SCALE
your dynamic business needs
by leveraging Scalable
Business Architecture
FREE UP SKILLED
RESOURCES
and focus on core business
with Intelligent Automation
PROTECT YOUR
CUSTOMERS
and your business robustly
with Integrated Security
THE BEDROCK OF THE MODERN DATA CENTER19

20
FOR MORE
INFO
20
NVIDIA:
https://www.nvidia.com/tesla
Dell C4140:
http://www.dell.com/en-
us/work/shop/povw/poweredge-c4140

Dell and NVIDIA for Your AI workloads in the Data Center

Dell and NVIDIA for Your AI workloads in the Data Center

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Dell and NVIDIA for Your AI workloads in the Data Center

Similaire à Dell and NVIDIA for Your AI workloads in the Data Center (20)

Plus de Renee Yao

Plus de Renee Yao (15)

Dernier

Dernier (20)

Dell and NVIDIA for Your AI workloads in the Data Center

Notes de l'éditeur