The deep learning tour - Q1 2017

Eran Shlomo, IPP tech lead, Haifa
eran.shlomo@intel.com, eran@dataloop.ai

About me
Haifa IoT Ignition lab and IPP(Intel ingenuity partnership program) tech lead.
Intel Perceptual computing.
Compute, cloud and embedded expert.
Maker and Entrepreneur
Focus on Data science and Machine learning in recent years
Soon to work on dataloop.ai

Agenda
What is deep learning
Why now ?
Different network topologies and their usage
The tools race
The processors (HW) race

Buzzwords alignment attempt
AI
Machine
learning
Supervised
learning
Deep
learning
Machine reasoning
Automated tasks
Train based on data
Neural networks
input
logic
output
input
output
logic

Deep learning – basic anatomy
Data driven
Training a model
Input, output and hidden neurons
Input layer Hidden layer(s) Output layer
Deep learning Many hidden (deep) layers

The essence of deeplearning
Xi YiWij(1) Wij(2)
W11(1)
X1
Y1
W11(2)
𝑌 = 𝑓 𝑋 = 𝑊𝑋+b
Deep network is essentially a function
we train to detect some pattern
b (bias) is omitted in this drawing
Data is becoming the fuel behind new SW,
BK (Intel CEO) – “Data is the new Oil”

Why now ?
28.2
25.8
16.4
11.7
6.7
3.57 2.99
2010 2011 2012 2013 2014 2015 2016
ILSVRC top 5 error
ILSVRC top 5 error
8 Layers 22 152
Alexnet
Shallow Ensemble
Data

Neural networks – Background and inspiration
It is pretty common to compare neural networks to how our brain works:
• Coupled well with the term AI
• Has some sense in it, as many different researches show. Yet we are a bit long from really understanding
how the brain works.
𝑘=0
𝑛
𝑊𝑋
W1
W2
W3
X1
X2
X3
𝑓(𝑥)

Network topologies
• There are many network topologies
• The basic principles apply:
• Supervised
• hidden units
• backpropagation training is common to most
• Training on data generates model, later to be used to inference on unseen data:
• Minimize a cost function

Some basic intuition
Model have capacity  Number of parameters.
Generally HW (compute/mem) limits the capacity
From the Paper: AN ANALYSIS OF DEEP NEURAL NETWORK MODELS
FOR PRACTICAL APPLICATIONS
More
compute
& Data
Higher
accuracy
Bigger
model

Model fit scenarios
0
20
40
60
80
100
120
140
160
0 5 10 15
Good model
0
20
40
60
80
100
120
140
160
0 5 10 15
Underfit/High
bias
0
20
40
60
80
100
120
140
160
0 5 10 15
Overfit/ High
Variance

Training model  Bias/Variance “games”
We can look at our model error as follows:
noise
model
error
Total
Error
Our error usually comes from combination of these two, These are all equivalent:
• High variance=modeling noise=not enough data=model too big=overfit
• High bias =model too simple=underfit

Fully connected networks
A very basic/generic network, Full nodes
connectivity
Used as a building block in more complex
topologies
High level task: Maps features into classes

Convolutional neural networks
On very simple images fully connected networks work pretty well with images
converted into vectors, but:
• Simple images (~10x10) works well, bigger images (~100x100) don’t:
• Too much data(parameters) is needed in order to train FC networks that way, not
practical. 100x100 image 10K pixel, 2 layer FC network 100M parameters.
Entering convolutional neural networks:
• Encodes special dependency, kind of Wight sharing
• Two main parts:
• Conv/Subsample acts as feature generators
• FC maps feature ensemble into classes

Recurrent neural networks
In general neural networks works well on bounded
areas, AKA the data collected to train.
In order to predict time series data (like stocks, ...) we
need time factor.
RNNs:
• Neurons as self connected
• Backpropegated through time.
• Each time stamp is now considered a laeyer.
• Issue: We need deep network  Many layers 
Vanishing gradient problem

Long Short Term memory networks
Solves the vanishing gradient problem, Long
memory by default
Contains gates that act as decision points
Usually LSTMs are proffered over RNN , more
compute is needed per timestamp but overall
accuracy is better.

Assembly C (compiler) C++(OOP) JAVA(managed)
Python (run
time)
Where we are in technology timeline perspective
Model
protos
High level
(e.g.
keras)
???? ???? ????

The programming language
Science Data science and deep learning are very close friends.
All are frontend languages with performant backend language (C++)
3 main languages:
My personal take … :
Python is the leading language:
• Free
• Won the deep learning community
• Most of the new tools / frameworks are python friendly.
• Production friendly
• Easy low level binding

Frameworks
Big frameworks supported by environment
Caffee
TensorFlow
MXNet
Keras
Torch
CNTK
Theano
Good comparision reference : https://github.com/zer0n/deepframeworks
Nnet
MXNet
Darch
deepnet
H2O
Neural networks toolbox

The big data/Cloud arena
All major cloud providers have ML services, deep learning model development
included.
Many other dedicated cloud services , some already acquired by tier 1 providers:
• Nervana
• Databricks
• Turi (GraphLab)
• H2O
• ..

Currently NVIDIA rules
Market top level segmentation:
• Training – building the
model, Data center
• Inference – Running the
model, also edge/client
In the short term intel is
positioned to take significant
inference market share (SW
moves only, existing x86 hw).

The (rough) deep learning compute math
• We have model capacity
• We have chip capacity
• Throughput = chip capacity/model capacity
But the story have few twists, It turns out that:
• Models can work well with low precision parameters
• A lot of sparse areas
• Memory plays significant role as well

New compute architectures wave is coming
Handle 16,8,4,2,1 bit
networks
Expect 100-300x
effective compute boost
Memory paths
adjustments

The race to the AI silicon has kicked off
Everybody is playing: Startups, Technology companies (Verticals), Corporations
Segments of the game:
• ASIC VS FPGA
• Edge VS cloud
• Inference VS training
• Network Generic VS network specific
• Models Arch/Eco-system

the AI era – New A
group
Academia
Development
Training and programs

A lot of HW/SW activity, The public
ones 
Knights Mill
Intel FPGA SDK

eran.shlomo@intel.com,
eran@dataloop.ai

The deep learning tour - Q1 2017

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à The deep learning tour - Q1 2017

Similaire à The deep learning tour - Q1 2017 (20)

Plus de Eran Shlomo

Plus de Eran Shlomo (7)

Dernier

Dernier (20)

The deep learning tour - Q1 2017

Notes de l'éditeur