SlideShare a Scribd company logo
1 of 36
Download to read offline
GPU ACCELERATED DEEP LEARNING
WITH CUDNN
Larry Brown Ph.D.
March 2015
2
1 Introducing cuDNN and GPUs
2 Deep Learning Context
3 cuDNN V2
4 Using cuDNN
AGENDA
3
Introducing cuDNN and GPUs
4
HOW GPU ACCELERATION WORKS
Application Code
+
GPU CPU
5% of Code
Compute-Intensive Functions
Rest of Sequential
CPU Code
~ 80% of run-time
5
WHAT IS CUDNN?
cuDNN is a library of primitives for deep learning
Applications
Libraries
“Drop-in”
Acceleration
Programming
Languages
Maximum
Flexibility
OpenACC
Directives
Easily Accelerate
Applications
6
ANALOGY TO HPC
cuDNN is a library of primitives for deep learning
Application
Fluid Dynamics
Computational Physics
IBM PowerIntel CPUs
Various CPU BLAS
implementations
cuBLAS/NVBLAS
BLAS standard interface
Tesla Titan
TK1 TX1
7
DEEP LEARNING WITH CUDNN
cuDNN is a library of primitives for deep learning
GPUs
cuDNN
Frameworks
Applications
Tesla TX-1 Titan
8
ANNOUNCING CUDNN V2
cuDNN V2 is focused on …
Performance and,
Features
… for the deep learning practitioner!
Optimized for current and future GPUs
9
Deep Learning Context
10
“Machine Learning” is in some sense a
rebranding of AI.
The focus is now on more specific, often
perceptual tasks, and there are many
successes.
Today, some of the world’s largest internet
companies, as well as the foremost research
institutions, are using GPUs for machine
learning.
ACCELERATING MACHINE LEARNING
CUDA for
Deep Learning
11
Image Classification, Object
Detection, Localization Face Recognition
Speech & Natural Language
Processing
Medical Imaging &
Interpretation
Seismic Imaging &
Interpretation Recommendation
MACHINE LEARNING USE CASES
…machine learning is pervasive
12
WHY IS DEEP LEARNING HOT NOW?
THREE DRIVING FACTORS…
1 - Big Data Availability
3 - Compute Density
2 - New ML Techniques
350 millions images uploaded per day
2.5 Petabytes of customer data hourly
100 hours of video uploaded every minute
Deep Neural Networks
GPUs
ML systems extract value from Big Data
13
DIFFERENT MODALITIES…SAME APPROACH
Image Vision features Detection
Images/video
Audio Audio features Speaker ID
Audio
Text
Text Text features
Text classification, Machine
translation, Information
retrieval, ....
Slide courtesy of Andrew Ng, Stanford University
14
DEEP LEARNING ADVANTAGES
Linear classifier
Regression
Support Vector Machine Bayesian
Decision Trees
Association Rules
Clustering
Deep Learning
 Don’t have to figure out the features ahead of time!
 Use same neural net approach for many different problems.
 Fault tolerant.
 Scales well.
15
WHAT IS DEEP LEARNING?
Input Result
Today’s Largest
Networks
~10 layers
1B parameters
10M images
~30 Exaflops
~30 GPU days
Human brain has trillions
of parameters – only
1,000 more.
16
CLASSIFICATION WITH DNNS
Training (Development) Inference (Production)
cars buses trucks motorcycles
truck
17
WHY ARE GPUS GREAT FOR DEEP LEARNING?
[Lee, Ranganath & Ng, 2007]
Neural Networks GPUs
Inherently
Parallel  
Matrix
Operations  
FLOPS  
GPUs deliver --
same or better prediction accuracy
faster results
smaller footprint
lower power
18
CONVOLUTIONAL NEURAL NETWORKS
 Biologically inspired.
 Neuron only connected to a small region of neurons in layer below it
called the filter or receptive field.
 A given layer can have many convolutional filters/kernels.
Each filter has the same weights across the whole layer.
 Bottom layers are convolutional, top layers are fully connected.
 Generally trained via supervised learning.
19
CONVOLUTIONAL NETWORKS
BREAKTHROUGH
Y. LeCun et al. 1989-1998 : Handwritten digit reading
A. Krizhevsky, G. Hinton et al. 2012 : Imagenet classification winner
CONVOLUTIONAL NET EXAMPLES
20
CNNS DOMINATE IN PERCEPTUAL TASKS
Slide credit: Yann Lecun, Facebook & NYU
21
GPUS – THE PLATFORM FOR MACHINE LEARNING
1.2M training images • 1000 object categories
Hosted by
Image Recognition Challenge
person
car
helmet
motorcycle
bird
frog
person
dog
chair
person
hammer
flower pot
power drill
4
60
110
0
20
40
60
80
100
120
2010 2011 2012 2013 2014
GPU Entries
Classification Error Rates
28%
26%
16%
12%
7%
0%
5%
10%
15%
20%
25%
30%
2010 2011 2012 2013 2014
22
Deep learning with COTS HPC
systems
A. Coates, B. Huval, T. Wang, D. Wu,
A. Ng, B. Catanzaro
ICML 2013
GOOGLE DATACENTER
1,000 CPU Servers
2,000 CPUs • 16,000 cores
600 kWatts
$5,000,000
STANFORD AI LAB
3 GPU-Accelerated Servers
12 GPUs • 18,432 cores
4 kWatts
$33,000
Now You Can Build Google’s
$1M Artificial Brain on the Cheap
“ “
GPUS MAKE DEEP LEARNING ACCESSIBLE
23
cuDNN version 2
24
CUDNN DESIGN GOALS
Basic Deep Learning Subroutines
Allow user to write a DNN application without any custom CUDA code
Flexible Layout
Handle any data layout
Memory – Performance tradeoff
Good performance with minimal memory use, great performance with more memory use
25
CUDNN ROUTINES
Convolutions – 80-90% of the execution time
Pooling - Spatial smoothing
Activation - Pointwise non-linear function
26
CONVOLUTIONS – THE MAIN WORKLOAD
Very compute intensive, but with a large parameter space
Layout and configuration variations
Other cuDNN routines have straightforward implementations
1 Minibatch Size
2 Input feature maps
3 Image Height
4 Image Width
5 Output feature maps
6 Kernel Height
7 Kernel Width
8 Top zero padding
9 Side zero padding
10 Vertical stride
11 Horizontal stride
27
CUDNN V2 - PERFORMANCE
CPU is 16 core Haswell E5-2698 at 2.3 GHz, with 3.6 GHz Turbo
GPU is NVIDIA Titan X
28
CUDNN V2 FLEXIBILITY
Can now specify a strategy the library will use to select the best
convolution algorithm:
PREFER_FASTEST
NO_WORKSPACE
SPECIFY_WORKSPACE_LIMIT
…or specify an algorithm directly…
GEMM
IMPLICIT_GEMM
IMPLICIT_PRECOMP_GEMM
DIRECT
29
CUDNN V2 NEW FEATURES
Other key new features:
 Support for 3D datasets. Community feedback desired!
 OS X support
 Zero-padding of borders in pooling routines
 Parameter scaling
 Improved support for arbitrary strides
 Support for upcoming Tegra X1 via JIT compilation
See Release Notes for details…
30
CUDNN V2 API CHANGES
Important – API Has Changed
 Several of the new improvements required changes to the cuDNN API.
 Applications previously using cuDNN V1 are likely to need minor modifications.
 Note Im2Col function is currently exposed public function…but will be removed.
The cuDNN team genuinely appreciates all feedback from the
Deep learning community.
The team carefully considers any API change.
cuDNN is still young…API changes expected to become rare in the future.
31
Using cuDNN
32
CUDNN EASY TO ENABLE
 Install cuDNN on your system
 Download CAFFE
 In CAFFE Makefile.config
 uncomment USE_CUDNN := 1
 Install CAFFE as usual
 Use CAFFE as usual.
 Install cuDNN on your system
 Install Torch as usual
 Install cudnn.torch module
 Use cudnn module in Torch instead
of regular nn module.
 cudnn module is API compatable
with standard nn module.
Replace nn with cudnn
CUDA 6.5 or newer required
33
DIGITS
Interactive Deep Learning GPU Training System
Data Scientists & Researchers:
Quickly design the best deep
neural network (DNN) for your
data
Visually monitor DNN training
quality in real-time
Manage training of many DNNs in
parallel on multi-GPU systems
developer.nvidia.com/digits
34
Choose a default network, modify
one, or create your own
Main Console
Create your dataset
Configure your Network
Start Training
Choose your database
DIGITS Workflow
Create your database
Configure your model
Start training
35
DIGITS
Download
network files
Training status
Accuracy and
loss values
during training
Learning rate
Classification on
the with the
network
snapshots
Classification
Visualize DNN performance in real time
Compare networks
Try it today!
developer.nvidia.com/cuDNN

More Related Content

What's hot

Supermicro Servers with Micron DDR5 & SSDs: Accelerating Real World Workloads
Supermicro Servers with Micron DDR5 & SSDs: Accelerating Real World WorkloadsSupermicro Servers with Micron DDR5 & SSDs: Accelerating Real World Workloads
Supermicro Servers with Micron DDR5 & SSDs: Accelerating Real World WorkloadsRebekah Rodriguez
 
Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)Saksham Tanwar
 
Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9inside-BigData.com
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDAMartin Peniak
 
HYPER-THREADING TECHNOLOGY
HYPER-THREADING TECHNOLOGYHYPER-THREADING TECHNOLOGY
HYPER-THREADING TECHNOLOGYSHASHI SHAW
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit pptSandeep Singh
 
Dynamic Voltage and Frequency Scaling
Dynamic Voltage and Frequency ScalingDynamic Voltage and Frequency Scaling
Dynamic Voltage and Frequency Scalingshubham ghimire
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with GpuRohit Khatana
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLinside-BigData.com
 
Cuda introduction
Cuda introductionCuda introduction
Cuda introductionHanibei
 

What's hot (20)

Cuda
CudaCuda
Cuda
 
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
Parallel Computing on the GPU
 
Introduction to multicore .ppt
Introduction to multicore .pptIntroduction to multicore .ppt
Introduction to multicore .ppt
 
Supermicro Servers with Micron DDR5 & SSDs: Accelerating Real World Workloads
Supermicro Servers with Micron DDR5 & SSDs: Accelerating Real World WorkloadsSupermicro Servers with Micron DDR5 & SSDs: Accelerating Real World Workloads
Supermicro Servers with Micron DDR5 & SSDs: Accelerating Real World Workloads
 
Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)
 
Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9
 
Lec04 gpu architecture
Lec04 gpu architectureLec04 gpu architecture
Lec04 gpu architecture
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
Cuda
CudaCuda
Cuda
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
HYPER-THREADING TECHNOLOGY
HYPER-THREADING TECHNOLOGYHYPER-THREADING TECHNOLOGY
HYPER-THREADING TECHNOLOGY
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
 
Intel core i3 processor
Intel core i3 processorIntel core i3 processor
Intel core i3 processor
 
Dynamic Voltage and Frequency Scaling
Dynamic Voltage and Frequency ScalingDynamic Voltage and Frequency Scaling
Dynamic Voltage and Frequency Scaling
 
It's Time to ROCm!
It's Time to ROCm!It's Time to ROCm!
It's Time to ROCm!
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with Gpu
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
GPU - Basic Working
GPU - Basic WorkingGPU - Basic Working
GPU - Basic Working
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
Cuda introduction
Cuda introductionCuda introduction
Cuda introduction
 

Viewers also liked

Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike WangIntroduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike WangPAPIs.io
 
The AI Era Ignited by GPU Deep Learning
The AI Era Ignited by GPU Deep Learning The AI Era Ignited by GPU Deep Learning
The AI Era Ignited by GPU Deep Learning NVIDIA
 
NVIDIA – Inventor of the GPU
NVIDIA – Inventor of the GPUNVIDIA – Inventor of the GPU
NVIDIA – Inventor of the GPUNVIDIA
 
High-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep LearningHigh-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep LearningIntel Nervana
 
GTC China 2016
GTC China 2016GTC China 2016
GTC China 2016NVIDIA
 
Investor Day 2015: The Expansion of Visual Computing
Investor Day 2015: The Expansion of Visual ComputingInvestor Day 2015: The Expansion of Visual Computing
Investor Day 2015: The Expansion of Visual ComputingNVIDIA
 
NVIDIA CES 2016 Highlights
NVIDIA CES 2016 HighlightsNVIDIA CES 2016 Highlights
NVIDIA CES 2016 HighlightsNVIDIA
 
NVIDIA Overview 2015
NVIDIA Overview 2015NVIDIA Overview 2015
NVIDIA Overview 2015NVIDIA
 
Building New Realities in AEC with NVIDIA Quadro VR Webinar
Building New Realities in AEC with NVIDIA Quadro VR WebinarBuilding New Realities in AEC with NVIDIA Quadro VR Webinar
Building New Realities in AEC with NVIDIA Quadro VR WebinarNVIDIA
 
CES 2017 Top 5 Stories
CES 2017 Top 5 StoriesCES 2017 Top 5 Stories
CES 2017 Top 5 StoriesNVIDIA
 
A Year of Innovation Using the DGX-1 AI Supercomputer
A Year of Innovation Using the DGX-1 AI SupercomputerA Year of Innovation Using the DGX-1 AI Supercomputer
A Year of Innovation Using the DGX-1 AI SupercomputerNVIDIA
 
GTC 2016 Opening Keynote
GTC 2016 Opening KeynoteGTC 2016 Opening Keynote
GTC 2016 Opening KeynoteNVIDIA
 
CES 2017: NVIDIA Highlights
CES 2017: NVIDIA HighlightsCES 2017: NVIDIA Highlights
CES 2017: NVIDIA HighlightsNVIDIA
 
AI For Enterprise
AI For EnterpriseAI For Enterprise
AI For EnterpriseNVIDIA
 
GTC 2015 Highlights
GTC 2015 HighlightsGTC 2015 Highlights
GTC 2015 HighlightsNVIDIA
 
Artificial Intelligence: Predictions for 2017
Artificial Intelligence: Predictions for 2017Artificial Intelligence: Predictions for 2017
Artificial Intelligence: Predictions for 2017NVIDIA
 
The Next Generation of AI and Deep Learning - GTC17
The Next Generation of AI and Deep Learning - GTC17The Next Generation of AI and Deep Learning - GTC17
The Next Generation of AI and Deep Learning - GTC17NVIDIA
 
20161122 gpu deep_learningcommunity#02
20161122 gpu deep_learningcommunity#0220161122 gpu deep_learningcommunity#02
20161122 gpu deep_learningcommunity#02ManaMurakami1
 
ECCV2010: feature learning for image classification, part 4
ECCV2010: feature learning for image classification, part 4ECCV2010: feature learning for image classification, part 4
ECCV2010: feature learning for image classification, part 4zukun
 

Viewers also liked (20)

Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike WangIntroduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang
 
The AI Era Ignited by GPU Deep Learning
The AI Era Ignited by GPU Deep Learning The AI Era Ignited by GPU Deep Learning
The AI Era Ignited by GPU Deep Learning
 
NVIDIA – Inventor of the GPU
NVIDIA – Inventor of the GPUNVIDIA – Inventor of the GPU
NVIDIA – Inventor of the GPU
 
High-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep LearningHigh-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep Learning
 
GTC China 2016
GTC China 2016GTC China 2016
GTC China 2016
 
Investor Day 2015: The Expansion of Visual Computing
Investor Day 2015: The Expansion of Visual ComputingInvestor Day 2015: The Expansion of Visual Computing
Investor Day 2015: The Expansion of Visual Computing
 
NVIDIA CES 2016 Highlights
NVIDIA CES 2016 HighlightsNVIDIA CES 2016 Highlights
NVIDIA CES 2016 Highlights
 
NVIDIA Overview 2015
NVIDIA Overview 2015NVIDIA Overview 2015
NVIDIA Overview 2015
 
Building New Realities in AEC with NVIDIA Quadro VR Webinar
Building New Realities in AEC with NVIDIA Quadro VR WebinarBuilding New Realities in AEC with NVIDIA Quadro VR Webinar
Building New Realities in AEC with NVIDIA Quadro VR Webinar
 
CES 2017 Top 5 Stories
CES 2017 Top 5 StoriesCES 2017 Top 5 Stories
CES 2017 Top 5 Stories
 
A Year of Innovation Using the DGX-1 AI Supercomputer
A Year of Innovation Using the DGX-1 AI SupercomputerA Year of Innovation Using the DGX-1 AI Supercomputer
A Year of Innovation Using the DGX-1 AI Supercomputer
 
GTC 2016 Opening Keynote
GTC 2016 Opening KeynoteGTC 2016 Opening Keynote
GTC 2016 Opening Keynote
 
CES 2017: NVIDIA Highlights
CES 2017: NVIDIA HighlightsCES 2017: NVIDIA Highlights
CES 2017: NVIDIA Highlights
 
AI For Enterprise
AI For EnterpriseAI For Enterprise
AI For Enterprise
 
GTC 2015 Highlights
GTC 2015 HighlightsGTC 2015 Highlights
GTC 2015 Highlights
 
Artificial Intelligence: Predictions for 2017
Artificial Intelligence: Predictions for 2017Artificial Intelligence: Predictions for 2017
Artificial Intelligence: Predictions for 2017
 
The Next Generation of AI and Deep Learning - GTC17
The Next Generation of AI and Deep Learning - GTC17The Next Generation of AI and Deep Learning - GTC17
The Next Generation of AI and Deep Learning - GTC17
 
RocksDB meetup
RocksDB meetupRocksDB meetup
RocksDB meetup
 
20161122 gpu deep_learningcommunity#02
20161122 gpu deep_learningcommunity#0220161122 gpu deep_learningcommunity#02
20161122 gpu deep_learningcommunity#02
 
ECCV2010: feature learning for image classification, part 4
ECCV2010: feature learning for image classification, part 4ECCV2010: feature learning for image classification, part 4
ECCV2010: feature learning for image classification, part 4
 

Similar to GPU Accelerated Deep Learning for CUDNN V2

APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUsiguazio
 
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA Taiwan
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platforminside-BigData.com
 
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...Willy Marroquin (WillyDevNET)
 
OpenStack & Ubuntu (india openstack day)
OpenStack & Ubuntu (india openstack day)OpenStack & Ubuntu (india openstack day)
OpenStack & Ubuntu (india openstack day)openstackindia
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
 
Dell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data CenterDell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data CenterRenee Yao
 
Profiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systemsProfiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systemsJack (Jaegeun) Han
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentationtestSri1
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of ComputingIntel Nervana
 
Optimization Deep Dive: Unreal Engine 4 on Intel
Optimization Deep Dive: Unreal Engine 4 on IntelOptimization Deep Dive: Unreal Engine 4 on Intel
Optimization Deep Dive: Unreal Engine 4 on IntelIntel® Software
 
Enabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesEnabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesWithTheBest
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Matej Misik
 
HPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTHPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTRenee Yao
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...E-Commerce Brasil
 
OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020OpenACC
 

Similar to GPU Accelerated Deep Learning for CUDNN V2 (20)

APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
Deep Learning Update May 2016
Deep Learning Update May 2016Deep Learning Update May 2016
Deep Learning Update May 2016
 
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
 
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
 
Nvidia at SEMICon, Munich
Nvidia at SEMICon, MunichNvidia at SEMICon, Munich
Nvidia at SEMICon, Munich
 
OpenStack & Ubuntu (india openstack day)
OpenStack & Ubuntu (india openstack day)OpenStack & Ubuntu (india openstack day)
OpenStack & Ubuntu (india openstack day)
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
Dell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data CenterDell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data Center
 
Profiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systemsProfiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systems
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of Computing
 
Optimization Deep Dive: Unreal Engine 4 on Intel
Optimization Deep Dive: Unreal Engine 4 on IntelOptimization Deep Dive: Unreal Engine 4 on Intel
Optimization Deep Dive: Unreal Engine 4 on Intel
 
Enabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesEnabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. Lowndes
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
 
HPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTHPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoT
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
 
OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020
 

More from NVIDIA

NVIDIA Story 2023.pdf
NVIDIA Story 2023.pdfNVIDIA Story 2023.pdf
NVIDIA Story 2023.pdfNVIDIA
 
NVIDIA GTC2022 Spring Highlights
NVIDIA GTC2022 Spring HighlightsNVIDIA GTC2022 Spring Highlights
NVIDIA GTC2022 Spring HighlightsNVIDIA
 
NVIDIA Brochure 2021 Company Overview
NVIDIA Brochure 2021 Company OverviewNVIDIA Brochure 2021 Company Overview
NVIDIA Brochure 2021 Company OverviewNVIDIA
 
NVIDIA GTC 2020 October Summary
NVIDIA GTC 2020 October SummaryNVIDIA GTC 2020 October Summary
NVIDIA GTC 2020 October SummaryNVIDIA
 
The Best of AI and HPC in Healthcare and Life Sciences
The Best of AI and HPC in Healthcare and Life SciencesThe Best of AI and HPC in Healthcare and Life Sciences
The Best of AI and HPC in Healthcare and Life SciencesNVIDIA
 
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019NVIDIA
 
NLP for Biomedical Applications
NLP for Biomedical ApplicationsNLP for Biomedical Applications
NLP for Biomedical ApplicationsNVIDIA
 
Top 5 Deep Learning and AI Stories - August 30, 2019
Top 5 Deep Learning and AI Stories - August 30, 2019Top 5 Deep Learning and AI Stories - August 30, 2019
Top 5 Deep Learning and AI Stories - August 30, 2019NVIDIA
 
Seven Ways to Boost Artificial Intelligence Research
Seven Ways to Boost Artificial Intelligence ResearchSeven Ways to Boost Artificial Intelligence Research
Seven Ways to Boost Artificial Intelligence ResearchNVIDIA
 
NVIDIA Developer Program Overview
NVIDIA Developer Program OverviewNVIDIA Developer Program Overview
NVIDIA Developer Program OverviewNVIDIA
 
NVIDIA at Computex 2019
NVIDIA at Computex 2019 NVIDIA at Computex 2019
NVIDIA at Computex 2019 NVIDIA
 
Top 5 DGX Sessions From GTC 2019
Top 5 DGX Sessions From GTC 2019Top 5 DGX Sessions From GTC 2019
Top 5 DGX Sessions From GTC 2019NVIDIA
 
DGX POD Top 4 Sessions From GTC 2019
DGX POD Top 4 Sessions From GTC 2019DGX POD Top 4 Sessions From GTC 2019
DGX POD Top 4 Sessions From GTC 2019NVIDIA
 
Top 5 Data Science Sessions from GTC 2019
Top 5 Data Science Sessions from GTC 2019Top 5 Data Science Sessions from GTC 2019
Top 5 Data Science Sessions from GTC 2019NVIDIA
 
This Week in Data Science - Top 5 News - April 26, 2019
This Week in Data Science - Top 5 News - April 26, 2019This Week in Data Science - Top 5 News - April 26, 2019
This Week in Data Science - Top 5 News - April 26, 2019NVIDIA
 
GTC 2019 Keynote in Silicon Valley
GTC 2019 Keynote in Silicon ValleyGTC 2019 Keynote in Silicon Valley
GTC 2019 Keynote in Silicon ValleyNVIDIA
 
CUDA DLI Training Courses at GTC 2019
CUDA DLI Training Courses at GTC 2019CUDA DLI Training Courses at GTC 2019
CUDA DLI Training Courses at GTC 2019NVIDIA
 
DGX Sessions You Won't Want to Miss at GTC 2019
DGX Sessions You Won't Want to Miss at GTC 2019DGX Sessions You Won't Want to Miss at GTC 2019
DGX Sessions You Won't Want to Miss at GTC 2019NVIDIA
 
Transforming Healthcare at GTC Silicon Valley
Transforming Healthcare at GTC Silicon ValleyTransforming Healthcare at GTC Silicon Valley
Transforming Healthcare at GTC Silicon ValleyNVIDIA
 
OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019NVIDIA
 

More from NVIDIA (20)

NVIDIA Story 2023.pdf
NVIDIA Story 2023.pdfNVIDIA Story 2023.pdf
NVIDIA Story 2023.pdf
 
NVIDIA GTC2022 Spring Highlights
NVIDIA GTC2022 Spring HighlightsNVIDIA GTC2022 Spring Highlights
NVIDIA GTC2022 Spring Highlights
 
NVIDIA Brochure 2021 Company Overview
NVIDIA Brochure 2021 Company OverviewNVIDIA Brochure 2021 Company Overview
NVIDIA Brochure 2021 Company Overview
 
NVIDIA GTC 2020 October Summary
NVIDIA GTC 2020 October SummaryNVIDIA GTC 2020 October Summary
NVIDIA GTC 2020 October Summary
 
The Best of AI and HPC in Healthcare and Life Sciences
The Best of AI and HPC in Healthcare and Life SciencesThe Best of AI and HPC in Healthcare and Life Sciences
The Best of AI and HPC in Healthcare and Life Sciences
 
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
 
NLP for Biomedical Applications
NLP for Biomedical ApplicationsNLP for Biomedical Applications
NLP for Biomedical Applications
 
Top 5 Deep Learning and AI Stories - August 30, 2019
Top 5 Deep Learning and AI Stories - August 30, 2019Top 5 Deep Learning and AI Stories - August 30, 2019
Top 5 Deep Learning and AI Stories - August 30, 2019
 
Seven Ways to Boost Artificial Intelligence Research
Seven Ways to Boost Artificial Intelligence ResearchSeven Ways to Boost Artificial Intelligence Research
Seven Ways to Boost Artificial Intelligence Research
 
NVIDIA Developer Program Overview
NVIDIA Developer Program OverviewNVIDIA Developer Program Overview
NVIDIA Developer Program Overview
 
NVIDIA at Computex 2019
NVIDIA at Computex 2019 NVIDIA at Computex 2019
NVIDIA at Computex 2019
 
Top 5 DGX Sessions From GTC 2019
Top 5 DGX Sessions From GTC 2019Top 5 DGX Sessions From GTC 2019
Top 5 DGX Sessions From GTC 2019
 
DGX POD Top 4 Sessions From GTC 2019
DGX POD Top 4 Sessions From GTC 2019DGX POD Top 4 Sessions From GTC 2019
DGX POD Top 4 Sessions From GTC 2019
 
Top 5 Data Science Sessions from GTC 2019
Top 5 Data Science Sessions from GTC 2019Top 5 Data Science Sessions from GTC 2019
Top 5 Data Science Sessions from GTC 2019
 
This Week in Data Science - Top 5 News - April 26, 2019
This Week in Data Science - Top 5 News - April 26, 2019This Week in Data Science - Top 5 News - April 26, 2019
This Week in Data Science - Top 5 News - April 26, 2019
 
GTC 2019 Keynote in Silicon Valley
GTC 2019 Keynote in Silicon ValleyGTC 2019 Keynote in Silicon Valley
GTC 2019 Keynote in Silicon Valley
 
CUDA DLI Training Courses at GTC 2019
CUDA DLI Training Courses at GTC 2019CUDA DLI Training Courses at GTC 2019
CUDA DLI Training Courses at GTC 2019
 
DGX Sessions You Won't Want to Miss at GTC 2019
DGX Sessions You Won't Want to Miss at GTC 2019DGX Sessions You Won't Want to Miss at GTC 2019
DGX Sessions You Won't Want to Miss at GTC 2019
 
Transforming Healthcare at GTC Silicon Valley
Transforming Healthcare at GTC Silicon ValleyTransforming Healthcare at GTC Silicon Valley
Transforming Healthcare at GTC Silicon Valley
 
OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019
 

GPU Accelerated Deep Learning for CUDNN V2

  • 1. GPU ACCELERATED DEEP LEARNING WITH CUDNN Larry Brown Ph.D. March 2015
  • 2. 2 1 Introducing cuDNN and GPUs 2 Deep Learning Context 3 cuDNN V2 4 Using cuDNN AGENDA
  • 4. 4 HOW GPU ACCELERATION WORKS Application Code + GPU CPU 5% of Code Compute-Intensive Functions Rest of Sequential CPU Code ~ 80% of run-time
  • 5. 5 WHAT IS CUDNN? cuDNN is a library of primitives for deep learning Applications Libraries “Drop-in” Acceleration Programming Languages Maximum Flexibility OpenACC Directives Easily Accelerate Applications
  • 6. 6 ANALOGY TO HPC cuDNN is a library of primitives for deep learning Application Fluid Dynamics Computational Physics IBM PowerIntel CPUs Various CPU BLAS implementations cuBLAS/NVBLAS BLAS standard interface Tesla Titan TK1 TX1
  • 7. 7 DEEP LEARNING WITH CUDNN cuDNN is a library of primitives for deep learning GPUs cuDNN Frameworks Applications Tesla TX-1 Titan
  • 8. 8 ANNOUNCING CUDNN V2 cuDNN V2 is focused on … Performance and, Features … for the deep learning practitioner! Optimized for current and future GPUs
  • 10. 10 “Machine Learning” is in some sense a rebranding of AI. The focus is now on more specific, often perceptual tasks, and there are many successes. Today, some of the world’s largest internet companies, as well as the foremost research institutions, are using GPUs for machine learning. ACCELERATING MACHINE LEARNING CUDA for Deep Learning
  • 11. 11 Image Classification, Object Detection, Localization Face Recognition Speech & Natural Language Processing Medical Imaging & Interpretation Seismic Imaging & Interpretation Recommendation MACHINE LEARNING USE CASES …machine learning is pervasive
  • 12. 12 WHY IS DEEP LEARNING HOT NOW? THREE DRIVING FACTORS… 1 - Big Data Availability 3 - Compute Density 2 - New ML Techniques 350 millions images uploaded per day 2.5 Petabytes of customer data hourly 100 hours of video uploaded every minute Deep Neural Networks GPUs ML systems extract value from Big Data
  • 13. 13 DIFFERENT MODALITIES…SAME APPROACH Image Vision features Detection Images/video Audio Audio features Speaker ID Audio Text Text Text features Text classification, Machine translation, Information retrieval, .... Slide courtesy of Andrew Ng, Stanford University
  • 14. 14 DEEP LEARNING ADVANTAGES Linear classifier Regression Support Vector Machine Bayesian Decision Trees Association Rules Clustering Deep Learning  Don’t have to figure out the features ahead of time!  Use same neural net approach for many different problems.  Fault tolerant.  Scales well.
  • 15. 15 WHAT IS DEEP LEARNING? Input Result Today’s Largest Networks ~10 layers 1B parameters 10M images ~30 Exaflops ~30 GPU days Human brain has trillions of parameters – only 1,000 more.
  • 16. 16 CLASSIFICATION WITH DNNS Training (Development) Inference (Production) cars buses trucks motorcycles truck
  • 17. 17 WHY ARE GPUS GREAT FOR DEEP LEARNING? [Lee, Ranganath & Ng, 2007] Neural Networks GPUs Inherently Parallel   Matrix Operations   FLOPS   GPUs deliver -- same or better prediction accuracy faster results smaller footprint lower power
  • 18. 18 CONVOLUTIONAL NEURAL NETWORKS  Biologically inspired.  Neuron only connected to a small region of neurons in layer below it called the filter or receptive field.  A given layer can have many convolutional filters/kernels. Each filter has the same weights across the whole layer.  Bottom layers are convolutional, top layers are fully connected.  Generally trained via supervised learning.
  • 19. 19 CONVOLUTIONAL NETWORKS BREAKTHROUGH Y. LeCun et al. 1989-1998 : Handwritten digit reading A. Krizhevsky, G. Hinton et al. 2012 : Imagenet classification winner CONVOLUTIONAL NET EXAMPLES
  • 20. 20 CNNS DOMINATE IN PERCEPTUAL TASKS Slide credit: Yann Lecun, Facebook & NYU
  • 21. 21 GPUS – THE PLATFORM FOR MACHINE LEARNING 1.2M training images • 1000 object categories Hosted by Image Recognition Challenge person car helmet motorcycle bird frog person dog chair person hammer flower pot power drill 4 60 110 0 20 40 60 80 100 120 2010 2011 2012 2013 2014 GPU Entries Classification Error Rates 28% 26% 16% 12% 7% 0% 5% 10% 15% 20% 25% 30% 2010 2011 2012 2013 2014
  • 22. 22 Deep learning with COTS HPC systems A. Coates, B. Huval, T. Wang, D. Wu, A. Ng, B. Catanzaro ICML 2013 GOOGLE DATACENTER 1,000 CPU Servers 2,000 CPUs • 16,000 cores 600 kWatts $5,000,000 STANFORD AI LAB 3 GPU-Accelerated Servers 12 GPUs • 18,432 cores 4 kWatts $33,000 Now You Can Build Google’s $1M Artificial Brain on the Cheap “ “ GPUS MAKE DEEP LEARNING ACCESSIBLE
  • 24. 24 CUDNN DESIGN GOALS Basic Deep Learning Subroutines Allow user to write a DNN application without any custom CUDA code Flexible Layout Handle any data layout Memory – Performance tradeoff Good performance with minimal memory use, great performance with more memory use
  • 25. 25 CUDNN ROUTINES Convolutions – 80-90% of the execution time Pooling - Spatial smoothing Activation - Pointwise non-linear function
  • 26. 26 CONVOLUTIONS – THE MAIN WORKLOAD Very compute intensive, but with a large parameter space Layout and configuration variations Other cuDNN routines have straightforward implementations 1 Minibatch Size 2 Input feature maps 3 Image Height 4 Image Width 5 Output feature maps 6 Kernel Height 7 Kernel Width 8 Top zero padding 9 Side zero padding 10 Vertical stride 11 Horizontal stride
  • 27. 27 CUDNN V2 - PERFORMANCE CPU is 16 core Haswell E5-2698 at 2.3 GHz, with 3.6 GHz Turbo GPU is NVIDIA Titan X
  • 28. 28 CUDNN V2 FLEXIBILITY Can now specify a strategy the library will use to select the best convolution algorithm: PREFER_FASTEST NO_WORKSPACE SPECIFY_WORKSPACE_LIMIT …or specify an algorithm directly… GEMM IMPLICIT_GEMM IMPLICIT_PRECOMP_GEMM DIRECT
  • 29. 29 CUDNN V2 NEW FEATURES Other key new features:  Support for 3D datasets. Community feedback desired!  OS X support  Zero-padding of borders in pooling routines  Parameter scaling  Improved support for arbitrary strides  Support for upcoming Tegra X1 via JIT compilation See Release Notes for details…
  • 30. 30 CUDNN V2 API CHANGES Important – API Has Changed  Several of the new improvements required changes to the cuDNN API.  Applications previously using cuDNN V1 are likely to need minor modifications.  Note Im2Col function is currently exposed public function…but will be removed. The cuDNN team genuinely appreciates all feedback from the Deep learning community. The team carefully considers any API change. cuDNN is still young…API changes expected to become rare in the future.
  • 32. 32 CUDNN EASY TO ENABLE  Install cuDNN on your system  Download CAFFE  In CAFFE Makefile.config  uncomment USE_CUDNN := 1  Install CAFFE as usual  Use CAFFE as usual.  Install cuDNN on your system  Install Torch as usual  Install cudnn.torch module  Use cudnn module in Torch instead of regular nn module.  cudnn module is API compatable with standard nn module. Replace nn with cudnn CUDA 6.5 or newer required
  • 33. 33 DIGITS Interactive Deep Learning GPU Training System Data Scientists & Researchers: Quickly design the best deep neural network (DNN) for your data Visually monitor DNN training quality in real-time Manage training of many DNNs in parallel on multi-GPU systems developer.nvidia.com/digits
  • 34. 34 Choose a default network, modify one, or create your own Main Console Create your dataset Configure your Network Start Training Choose your database DIGITS Workflow Create your database Configure your model Start training
  • 35. 35 DIGITS Download network files Training status Accuracy and loss values during training Learning rate Classification on the with the network snapshots Classification Visualize DNN performance in real time Compare networks