Alex Sabatier from Nvidia talks about the future of Deep Learning from an chipmaker perspective
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
6. 6NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
PERFORMANCE GAP CONTINUES TO GROW
0
100
200
300
400
500
600
700
800
2008 2011 2012 2014 2016
Peak Memory Bandwidth
NVIDIA GPU x86 CPU
M2090
M1060
K20
K80
Pascal
GB/s
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
2008 2010 2012 2014 2016
Peak Double Precision FLOPS
NVIDIA GPU x86 CPU
M1060
K20
GFLOPS
K80
Pascal
M2090
7. 7
CREDENTIALS BUILT OVER TIME
300K CUDA Developers,
4x Growth in 4 years
Majority of HPC Applications are
GPU-Accelerated, 410 and Growing
100% of Deep Learning
Frameworks are Accelerated
113
206
242
370
410
0
50
100
150
200
250
300
350
400
450
2011 2012 2013 2014 2015 2016
287
# of Applications
Academia Games Finance
Manufacturing Internet Oil & Gas
National Labs Automotive Defense
M & E
300K
TORCH
THEANO
CAFFE
MATCONVNET
PURINEMOCHA.JL
MINERVA MXNET*
BIG SUR TENSORFLOW
WATSON CNTK
9. 9
DEEP LEARNING
A NEW COMPUTING MODEL
“little girl is eating
piece of cake"
LEARNING
ALGORITHM
“millions of trillions
of FLOPS”
Device
InferenceTraining
10. 10
IMPACT OF AI IS HUGE FOR ENTERPRISES
Google: Hundreds of millions of dollars in
power savings with AI-operated data center
Netflix: $1 billion savings per year with AI-
assisted recommendation engine
AdTheorent: 300% higher user engagement for
mobile advertisers during shopping season
13. 13NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
170 TFLOPS FP16
Accelerates Major AI Frameworks
Dual 10GbE, Quad IB 100Gb
3RU – 3200W
NVIDIA DGX-1
14. 14
0X
4X
8X
12X
16X
GeForce® GTX TITAN X GeForce® GTX 1080 Tesla® P100 DIGITS™ DevBox (4X
GeForce® GTX Titan X)
Quadro® VCA (8X Quadro®
M6000)
DGX-1™ (8X Tesla® P100)
RelativeTrainingPerformance
ResNet Inception v3 AlexNet vgg MSR
DGX-1 — A LEAGUE OF ITS OWN
Caffe on DeepMark. GeForce TITAN X and GTX 1080 system: Intel Core i7-5930K @ 3.5 GHz, 64 GB System Memory | Tesla P100 (SXM2) system: Dual CPU server, Intel E5-2698 v4 @ 2.2 GHz, 256 GB System Memory
1X
GeForce GTX TITAN X GeForce GTX 1080 Tesla P100 DIGITS DevBox
(4X GeForce GTX TITAN X)
Quadro VCA
(8X Quadro M6000)
DGX-1
(8X Tesla P100)
15. 15
Instant productivity — plug-and-
play, supports every AI framework
and accelerated analytics
software applications
Performance optimized across
the entire stack
Always up-to-date via the cloud
Mixed framework environments
— baremetal and containerized
Direct access to NVIDIA experts
DGX STACK
Complete Analytics and Deep Learning platform
16. 16
NVIDIA EXPERTISE AT EVERY STEP
Solution Architects
Global Network
of Partners
Deep Learning
Institute
GTC
Conferences
1:1 support
Network training setup
Network optimization
Certified expert instructors
Worldwide workshops
Online courses
Epicenter of industry leaders
Onsite training
Global reach
NVIDIA Partner Network
OEMs
Startups
Need image
17. 17
NVIDIA DEEP LEARNING PARTNERS
Graph and Data
Analytics
Enterprises Data ManagementDL Frameworks Enterprise DL
Services Core Analytics Tech
18. 18
NVIDIA DEEP LEARNING EVERYWHERE,
EVERY PLATFORM
TITAN X
Available via etail in
200+ countries
DGX-1
The AI appliance for
instant productivity
TESLA
Servers in every shape and size
CLOUD
Everywhere
19. 19
VISIT THE DEEP LEARNING WEBPAGE
http://www.nvidia.com/object/deep-learning.html
Notes de l'éditeur
First I am going to start to provide some insight about NVIDIA and the GPU which it at the center of it strategy.
Then we will discover why Deep learning, this new field of AI using massive dataset to create software solving amazing problems across the whole industry.
I also want to mention how NVIDIA GPUs make their way in the datacenter, not only for HPC and the nascent Deep learning, but also addressing accelerated data analytics now that the amount of available data grow exponentially.
You will discover what’s make DGX1 an unique platform powered by the 5 miracle of the latest GPU generation named Pascal. The unique hardware configuration explains the unprecedented power of this appliance.
To make it a tool of his own, NVIDIA take a great proud to work on the dedicated SW stack allowing the users to take advantage of his row power without having to spend hours configuring and loaded the various sw elements.
What we wanted to achieve is to give data scientist and deep learner a ready to use platform powerfull enough where they can directly express their creativity and solve issues they were not able to address up. to now.
Ready?, so let’s start.
We could say that NVIDIA makes computers that are loved by the most demanding users in the world — gamers, designers, and scientists.
NVIDIA pioneered GPU computing, a supercharged form of computing 10 years ago allowing CUDA, the GPU programing language to be accessible on all of our platforms including the gaming ones. Yes a gamer can have a break from his favorite game and spend some time programming the GPU he was previously using for gaming.
The GPU evolved from a 3D graphics chip into a computer platform that gives humans the power to simulate virtual worlds. The fabulous and rich worlds created by Game developers of course, but also the creations from Industrial designers and Architect. 90% of the workstation uses NVIDIA graphic solution to render and visualize their creations. The movie industry also used our Graphic solution for CGI, special effect and animation movies. NVIDIA works very closely with application developers and their users to make sure the whole chain benefits the power of the GPU.
That’s not all biologist and medical imaging specialists also use GPUs to simulate a drug effect on a specific molecule. It is also used to simulate weather and make prediction, make financial simulation, used by oil an gas exploration to explore underground resource.
That’s computing human imagination.
But GPU are also amazingly powerful to understand the world around them.
Compute Vision and Deep learning algorithm allow computers to see and understand the world around them, recognizing the road lanes, traffic signs, truck, cars , bike, pedestrian in a slef driving car. Helping a robot to identify the object he can grab, recognizing valuable object from garbage on a recycling line at a waste processing factory.
The impact of visual recognition is almost infinite.
But it is not all, Machines can now listen, understand and translate what they hear. Have you tried OK Google on and android device. It works pretty well , It works even for me with my so thick French accent.
With the ability to see and hear, those computers are simulating Human intelligence. GPU trained the deepmind/Google system who beat the world champion at the game of go.
So why GPU excel at those tasks?
It all comes down to an architecture differences vs traditionnal CPUs
GPU Computing is a type of heterogeneous computing – that is, parallel computing with multiple processor architectures involving CPU and GPU.
CPU is a an old architecture known for being sequential and featuring a complex instruction set. CPUs have now a few cores 2,4,8, up to 20 -24 for the most powerful one.
GPU contains up to several thousand cores able to execute similar instructions in the same clock cycle. This is highly efficient for today’s computing applications having data intensive requirements and a large amount of a parallelism.
GPUs are best at accelerate these parallel parts of any applications by spreading the computation over those thousands of cores.
The results if a significant speedup in application execution time.
Andrew Ng, head of Baidu research in AI , view the GPU as a high throughput parallel engine.
The deep learning algorithm used by Andrew’s team in “Deep speech 2” the name of his Neural language processing system requires a lot of parallel throughput to train the model.
It could not be possible to use a CPU for that. Thanks to CPU the model is so performant that he is able to understand Mandarin talked by a 5 year old…
Let me show you some numbers.
Those 2 diagrams shows the evolution of the compute power and the memory bandwidth for both GPU (Green curve) and CPU (blue curve) during the last 8 years.
It is obvious that the trend is completely different here. While the CPU compute power expressed here in Gflops ( Float point Op/sec) follows Moore’s law, GPU shows an ability to increase significantly its compute power generation after generation. The latest Pascal P100 tesla GPU are close to 5 GFLOPS while the latest x86 CPU are barely reaching half a GFLOPS.
The performance increase is due to the parallelization of the compute distributed across an increasing number of cores. GPU have a powerful compute capability, but it is not enough.
To be truly efficient data needs to arrive quickly to the cores to be processed. The curves on the right hand side shows the evolution of the memory bandwidth in Gbyte/sec. once again the diagram talk by itself.
NVIDIA engineered its GPU so that the compute power and memory bandwith allow to address the latest computing need of the industry.
Raw power and memory bandwidth are a great starting point, But it is when a great eco-system develop solution on the platform that its truly serve its purpose.
NVIDIA built a platform that HPC can build on. It started 10 years ago, when CUDA was accessible to developers accelerated computing is now the most important platform for HPC.
No other company in the world comes close.
There is wide ecosystem of applications developers, researchers, students and users that NVIDIA feed with the latest updates, training and conferences to create a strong link that allow us to understand the actual use and future needs to provide the platform of tomorrow.
Majority of HPC apps are now accelerated. Every HPC center can enjoy the benefits of GPUs.
410 apps now (9 of the top 10 – 35 of the top 50)
It is Driven by the world’s largest ecosystem of HPC developers. 4x growth. Why? CUDA and OpenACC uniquely provide simple way to get performance for parallel arch’s.
DL: All frameworks optimized for GPUs
Training neural networks to perform these tasks is a huge computational challenge. It can take weeks or even months to design and train a neural network to perform a specific task at near-human levels of accuracy. Fortunately, much of this work can be performed in parallel on modern GPUs.
We created dedicated GPU accelerated and SDK to allow those frameworks to be GPU accelerated.
Name some of them.
Today these GPU-accelerated DL frameworks are being used by researchers and scientists world wide to solve problems in computer vision, speech translation, natural language understanding, that were previously considered impossible to solve.
You may have heard of the rie of GPU in the data center.
Microsoft Azure and Amazone AWS raced for the GA of K80 clusters to enable GPU cloud computing services.
Google is a massive users of NVIDA GPU, Jeff Dean, the head of search came to our annual GPU conference named GTC and claim during his key note that Deep learning won’t be possible without GPUs – CPU are too slow.
That was a massive endorsement.
IBM congnitive machine Watson is using NVIDIA GPU.
The US most powerful supercomputer also relies on NVIDIA GPU at Oak Ridge and Livermore National lab.
Because the massively parallel GPU was widely available, more and more people started to pick it up.
4-5 years ago, AI researchers discovered it.
The combination of deep neural networks, big data and powerful GPU platforms reignited AI.
Deep learning — a new computing model where networks are “trained” to extract features from massive amounts of data — has proven to be remarkably effective at solving some of the most complex problems in computer science.
In this example a network is trained with an millions of images, the learning phase requires trillions and trillions of operation to create a Neural network.
This Neural net is now able to recognize and describe an image he never seen before.
When the Neural Net struggle to understand a picture, the image I analysed by data scientist and added in the initial data set to train an update model which will be successful when a similar picture is presented.
It reminds me this quote from Mandela: “ I never lose, either I win or I learn something”
Deep learning allow NN to learn from their mistake.
Deep learning applications are highly valuable, let me give you some examples:
Netflix uses a Deepl learning powered recommendation algorithm that submit personalized sugestions to the user. Netflix explained
Strong recommendations increase the amount of time viewers watch content on Netflix keeping subscriber churn as low as possible. According to a paper published by Netflix executives, the on-demand video streaming service claims its AI assisted recommendation system saves the company $1 billion per year.
Google was a pioneer in Deep learning using GPU with Search. Sundar Pichai, head of Android, recently praised deep learning for new applications like the assistant or the new messaging app “Hello” who provides pre-formatted contextual answers to text messages.
Google nailed the value of Deep learning there is now 1600 application using deep learning.
They even use it to manage the power of their numerous Data center. The neural networks control "about 120 variables in the data centers," including "the fans and the cooling systems, the windows and other things." The AI worked out the most efficient methods of cooling by analyzing data from sensors among the server racks, including information on things like temperatures and pump speeds..
http://www.theverge.com/2016/7/21/12246258/google-deepmind-ai-data-center-cooling
"One last example that might help add context to the use of deep learning and the type of results it can deliver is AdTheorent,
This company built a DL system to deliver real-time bidding assistance to advertisers bidding on ads for mobile devices.
It helped advertisers reach an engagement level that was 200 to 300 per cent higher than the industry average for the last holiday shopping season.
This is an example of a system that needs a very fast response time and carries an incredible value.
Because we wanted the whole community of data scientists and Deep learners, across all industries, to be able to create those fantastic uses cases we decide to build the ultimate appliance for it.
We best knows GPU and the way to accelerate the DL framework so we create DGX1 to unleash DL enormous promise.
DGX1 is an AI supercomputer-in-a-box.
DGX-1 is a plug-and-play appliance with the computing power of a 250-node HPC cluster. I told you a shrink version of a data center delivered with a built-in software stack supporting the main DL frameworks and GPU accelerated database and graph applications.
A dream come true for the users who just have to focus on their task nd not spend cycle to prepare and maintain their software tools.
Let’s have a look inside:
DGX1 delivered 170 Tflops of FP16 performances thanks to 8 GPU P100 sxm2 with 16GB of HBM2 each ensuring a memory bandwidth of 732GByte/s for a total of 28, 672 CUDA cores!!!
Those 8 GPUs are connected with NVlink: NVIDIA new interconnect interface that allow data to be shared between GPU 5 times faster than the latest PCIe Gen3 bus.
The GPU are connected in a hybrid Cube mesh configuration, which allow the farthest GPU to be only 2 node away. It is ideal when you want large job to be spread across several GPUS. This is what you want for highly scalable systems.
2 Xeon CPU with 20 cores with a total of 256GB of system memory are used for scheduling and pushing the data to the GPU.
That 3RU box requires a total of 3200W of power supply and will be connected to othe DGX1 boxes or the rest for your datacenter with a Dual 10GB Ethernet and a quad infiniband 100Gbps EDR interface.
P100 SXM2 is a new generation of GPU with a dedicated form factor specifically designed for the data center.
with the DGX-1 platform, you now have all of the tools needed for realizing the benefits of GPU-accelerated deep learning. NVIDIA already provides a number of libraries for accelerating computational performance through GPUs. Some of these libraries, like NCCL, have actually been optimized specifically for the 8-GPU architecture on DGX-1.
NVIDIA Docker containers provide packaged applications such as DL frameworks that are multi-GPU aware, AND you can schedule and run these from the NVIDIA cloud service. All these tools also make it easy for you to build, package, and run your own containerized applications on DGX-1. In other words, with NVIDIA libraries, containers, and DGX-1, you’ve got everything you need for developing and running multi-GPU accelerated applications.
Container Based Applications – Easily run accelerated computing and deep learning frameworks through GPU aware containers. Build your own containers and host them in a private repository through the cloud service.
NVIDIA Cloud Management – Manage your node or cluster and run containers from the NVIDIA cloud service. Connecting to the cloud is easy; just plug in power and internet and you’re ready to go.
Deep learning is a fundamentally new software model that needs a new computing platform
GPU computing is an ideal approach and the GPU is the ideal processor.
A combination of factors is essential to create a new computing platform — performance, programming productivity, and open accessibility.
Performance. NVIDIA GPUs are naturally great at parallel workloads and speed-up DNNs by 10-20x, reducing each of the many training iterations from weeks to days.
Programmability. AI innovation is on a breakneck pace.
Ease of programming and developer productivity is paramount.
The programmability and richness of NVIDIA’s CUDA platform allow researchers to innovate quickly — building new configurations of CNNs, DNNs, deep inception networks, RNNs, LSTMs, and reinforcement learning networks.
Accessibility. Developers want to create anywhere and deploy everywhere.
NVIDIA GPUs are available all over the world, from every PC OEM; in desktops, notebooks, servers, or supercomputers; and in the cloud from Alibaba, Amazon, Baidu, Google, IBM and Microsoft.