This is a 2 hours overview on the deep learning status as for Q1 2017.
Starting with some basic concepts, continue to basic networks topologies , tools, HW/Accelerators and finally Intel's take on the the different fronts.
1. Eran Shlomo, IPP tech lead, Haifa
eran.shlomo@intel.com, eran@dataloop.ai
2. About me
Haifa IoT Ignition lab and IPP(Intel ingenuity partnership program) tech lead.
Intel Perceptual computing.
Compute, cloud and embedded expert.
Maker and Entrepreneur
Focus on Data science and Machine learning in recent years
Soon to work on dataloop.ai
3. Agenda
What is deep learning
Why now ?
Different network topologies and their usage
The tools race
The processors (HW) race
5. Deep learning – basic anatomy
Data driven
Training a model
Input, output and hidden neurons
Input layer Hidden layer(s) Output layer
Deep learning Many hidden (deep) layers
6. The essence of deeplearning
Xi YiWij(1) Wij(2)
W11(1)
X1
Y1
W11(2)
𝑌 = 𝑓 𝑋 = 𝑊𝑋+b
Deep network is essentially a function
we train to detect some pattern
b (bias) is omitted in this drawing
Data is becoming the fuel behind new SW,
BK (Intel CEO) – “Data is the new Oil”
7. Why now ?
28.2
25.8
16.4
11.7
6.7
3.57 2.99
2010 2011 2012 2013 2014 2015 2016
ILSVRC top 5 error
ILSVRC top 5 error
8 Layers 22 152
Alexnet
Shallow Ensemble
Data
8. Neural networks – Background and inspiration
It is pretty common to compare neural networks to how our brain works:
• Coupled well with the term AI
• Has some sense in it, as many different researches show. Yet we are a bit long from really understanding
how the brain works.
𝑘=0
𝑛
𝑊𝑋
W1
W2
W3
X1
X2
X3
𝑓(𝑥)
9. Network topologies
• There are many network topologies
• The basic principles apply:
• Supervised
• hidden units
• backpropagation training is common to most
• Training on data generates model, later to be used to inference on unseen data:
• Minimize a cost function
10. Some basic intuition
Model have capacity Number of parameters.
Generally HW (compute/mem) limits the capacity
From the Paper: AN ANALYSIS OF DEEP NEURAL NETWORK MODELS
FOR PRACTICAL APPLICATIONS
More
compute
& Data
Higher
accuracy
Bigger
model
12. Training model Bias/Variance “games”
We can look at our model error as follows:
noise
model
error
Total
Error
Our error usually comes from combination of these two, These are all equivalent:
• High variance=modeling noise=not enough data=model too big=overfit
• High bias =model too simple=underfit
14. Fully connected networks
A very basic/generic network, Full nodes
connectivity
Used as a building block in more complex
topologies
High level task: Maps features into classes
15. Convolutional neural networks
On very simple images fully connected networks work pretty well with images
converted into vectors, but:
• Simple images (~10x10) works well, bigger images (~100x100) don’t:
• Too much data(parameters) is needed in order to train FC networks that way, not
practical. 100x100 image 10K pixel, 2 layer FC network 100M parameters.
Entering convolutional neural networks:
• Encodes special dependency, kind of Wight sharing
• Two main parts:
• Conv/Subsample acts as feature generators
• FC maps feature ensemble into classes
16. Recurrent neural networks
In general neural networks works well on bounded
areas, AKA the data collected to train.
In order to predict time series data (like stocks, ...) we
need time factor.
RNNs:
• Neurons as self connected
• Backpropegated through time.
• Each time stamp is now considered a laeyer.
• Issue: We need deep network Many layers
Vanishing gradient problem
17. Long Short Term memory networks
Solves the vanishing gradient problem, Long
memory by default
Contains gates that act as decision points
Usually LSTMs are proffered over RNN , more
compute is needed per timestamp but overall
accuracy is better.
19. Assembly C (compiler) C++(OOP) JAVA(managed)
Python (run
time)
Where we are in technology timeline perspective
Model
protos
High level
(e.g.
keras)
???? ???? ????
20. The programming language
Science Data science and deep learning are very close friends.
All are frontend languages with performant backend language (C++)
3 main languages:
My personal take … :
Python is the leading language:
• Free
• Won the deep learning community
• Most of the new tools / frameworks are python friendly.
• Production friendly
• Easy low level binding
21. Frameworks
Big frameworks supported by environment
Caffee
TensorFlow
MXNet
Keras
Torch
CNTK
Theano
Good comparision reference : https://github.com/zer0n/deepframeworks
Nnet
MXNet
Darch
deepnet
H2O
Neural networks toolbox
22. The big data/Cloud arena
All major cloud providers have ML services, deep learning model development
included.
Many other dedicated cloud services , some already acquired by tier 1 providers:
• Nervana
• Databricks
• Turi (GraphLab)
• H2O
• ..
24. Currently NVIDIA rules
Market top level segmentation:
• Training – building the
model, Data center
• Inference – Running the
model, also edge/client
In the short term intel is
positioned to take significant
inference market share (SW
moves only, existing x86 hw).
25. The (rough) deep learning compute math
• We have model capacity
• We have chip capacity
• Throughput = chip capacity/model capacity
But the story have few twists, It turns out that:
• Models can work well with low precision parameters
• A lot of sparse areas
• Memory plays significant role as well
26. New compute architectures wave is coming
Handle 16,8,4,2,1 bit
networks
Expect 100-300x
effective compute boost
Memory paths
adjustments
27. The race to the AI silicon has kicked off
Everybody is playing: Startups, Technology companies (Verticals), Corporations
Segments of the game:
• ASIC VS FPGA
• Edge VS cloud
• Inference VS training
• Network Generic VS network specific
• Models Arch/Eco-system