SlideShare une entreprise Scribd logo
1  sur  40
Eran Shlomo, IPP tech lead, Haifa
eran.shlomo@intel.com ©
About me
Haifa IoT Ignition lab and IPP(Intel ingenuity partnership program) tech lead.
Intel Perceptual computing.
Compute, cloud and embedded expert.
Maker and Entrepreneur
Focus on Data science and Machine learning in recent years
Agenda
Lets talk some theory
Lets define a problem
Time to code our network
Meet the pro tools
Time to fancier netwroks.
By the end of the day…
You will have:
• Get some intuition on what is DL, what you can use it for.
• Have understanding of the mechanics behind deep learning.
• Get basic feeling of the concepts and how does DL works.
• Some hands on known tools.
• A list of pointers to continue your learning and experimenting.
You will not have:
• Practical experience on solving problems using DL
• Understanding of the different types of networks and their usage
• The math skills required to be an expert.
We are going to work in try, catch up
• Along the way we have exercises, you will get time to try them
• Usually next slide will contain the solution
• So for every task:
• Try
• Catch up once solution is on board, focus on understanding the solution
• Make sure you have it working, each step is required for the one after it.
Buzzwords alignment attempt
AI
Machine
learning
Supervised
learning
Deep
learning
Machine reasoning
Automated tasks
Train based on data
Neural networks
input
logic
output
input
output
logic
Assembly C (compiler) C++(OOP) JAVA(managed)
Python (run
time)
Where we are in technology timeline perspective
Model
protos
High level
(keras)
???? ???? ????
Deep learning – basic anatomy
Data driven
Training a model
Input, output and hidden neurons
Input layer Hidden layer(s) Output layer
Deep learning Many hidden (deep) layers
The essence of deeplearning
Xi YiWij(1) Wij(2)
W11(1)
X1
Y1
W11(2)
𝑌 = 𝑓 𝑋 = 𝑊𝑋+b
Deep network is essentially a function
we train to detect some pattern
b (bias) is omitted in this drawing
Why is the sudden success ?
A lot of data
A lot of compute
Improved networks
Before we start , some Math …
• As data science is becoming part of every business math gaining extra
popularity.
• Question you need to ask yourself if you wish to go deeper into the field – Do I
Want/Can refresh/increase my math skills.
• For our basic deeplearning course we need some:
• Algebra, Mainly around Matrix/Vector operations
• Calculus, Mainly around derivatives
• You can never get enough of statistics in data science, go over variance, mean,
distributions, probabilities
• Python
Some math references to get start with
https://www.youtube.com/watch?v=K5BLNZw7UeU Matrix operations
https://www.youtube.com/watch?v=kuixY2bCc_0 Multiplying matrices
https://www.youtube.com/watch?v=rAof9Ld5sOg Derivatives
https://www.youtube.com/watch?v=TUJgZ4UDY2g The chain rule
https://www.youtube.com/watch?v=ZkjP5RJLQF4 Linear regression
https://www.youtube.com/watch?v=_Po-xZJflPM Logistic regression
https://www.youtube.com/watch?v=Y4lTTHua0TE Mean, Variance,…
Lets practice some basics
• We are working with python 3.5 . you are encouraged to work with conda
package manager but pip is oke as well.
• numpy is THE math operations package for python, we will be using it to play
with matrices, install it.
• Lets create 5 random normal numbers, making sure numpy is good to go
• Visualization is very important in general and in the course, install matplotlib
• Visualize 100 random numbers like the example above.
Normal random
We got 100 normally distributed numbers, lets create a
histogram of them
By default our normal distribution is with mean 0 and
variance of 1.
Create 2 matrices of 10x10: A = N(0,1), B =N(3,16)
plot their histogram
Hint: Flatten the matrix to histogram
Type equation here.
X~N(M, σ2
)
Variancemean
Normal random
You can switch between the standard normal to any normal, Z=N(0, 12)
Code a function that multiply two matrices explicitly,
you can assume inputs are nd array, don’t use numpy
matrix multiplication operator.
def mul_matrix(a,b):
pass
Hint: for loop is going over rows in numpy, matrix.T is the transpose
Matrix multiplication
Validate your code using numpy
Hint: numpy.dot
Neural networks – Background and inspiration
It is pretty common to compare neural networks to how our brain works:
• Coupled well with the term AI
• Has some sense in it, as many different researches show. Yet we are a bit long from really understanding
how the brain works.
𝑘=0
𝑛
𝑊𝑋
W1
W2
W3
X1
X2
X3
𝑓(𝑥)
Artificial neural networks
Output=f( 𝑘=0
𝑛
𝑊𝑋), where :
WX – inputs multiplied by weights
F(x) is an activation function
Common activation functions: sigmoid,
relu, tanh, linear, …
1. Code the sigmoid function
Use numpy, z can be a
vector
def sigmoid(z):
…2. Plot 100 points of your
sigmoid.
Hint: plt.plot(X, Y,'co') draw
points only
Synthetic problem for our neural network
One of the biggest challenges with deeplearning is data, and a lot of it.
We will override this problem using synthetic problem, we will model predefined
function.
We usually divide our dataset to (at least) two groups:
• Training, ~70% of our data.
• Test, 30% of our data.
In real life cases you are likely to have validation as well.
Our problem is to predict some function behavior given points from this function, later
comparing our model into new generated points from the function.
Lets example
Utils.py
You got utils.py module, this module contains some help functions.
Git repo https://gitlab.com/eshlomo/EazyDnn , utils.py under base_network
Lets generate a pattern:
• Signature:
• Function is given with range, number of samples and generator, which is the
function itself.
• Generate 100 samples of 𝑓 𝑥 = 𝑥2 between -1 to 1
• Plot the generated function
Before we go into our network
• This is a lot of info to push into short time.
• You are likely not be able to follow all details in real time, In order to fully feel
you got it you need to train and train and …
• We are going through these details to give you a solid base for self learning.
• Don’t get too alarmed from the math.
• Feel free to contact me 
Lets model a line
𝑓 𝑥 = 𝑥2Previous ex. solution
We will create the following network
𝑥 𝑓 𝑥 𝑓 𝑥 = 𝑥
• Generate 100 samples of 𝑓 𝑥 = 𝑥, between 0 to 1
• Plot the generated function
𝑤11
(1)
𝑤12
(1)
𝑤13
(1)
𝑤 1 𝑤 2
𝑤11
(2)
𝑤21
(2)
Network basics
We are going to train our network to estimate our line, in other words:
• We are going to find the best vectors w1,w2 that will make our network model
our line.
• Training is an iterative process in which in every iteration we minimize our
model error be small change to our weight vectors.
• For that purpose we define a cost (error) function on our model.
Previous ex. solution 𝑓 𝑥 = 𝑥
Cost function
• Lets mark our model output as 𝑌, and our real output as 𝑌
• We use Quadratic cost (marked with J), Also known as mean squared
error, maximum likelihood, and sum squared error :Err𝑜𝑟 = 𝐽 𝑌 =
1
2
(𝑌 − 𝑌)2
• Keep in mind we know the real output, we have training data==ground truth==
annotated data.
• Since we want to minimize our error, we would like to move our weights
against the derivative at any given iteration.
• Similar to finding minimum of function in calculus, only in numeric way, This
process is called gradient decent.
Gradient decent
Lets start creating our network, create a class that will manage our network:
• 3 layers, their size is constructor parameter.
• 2 weight matrices
• Init all weights with standard random normal
A process in which every iteration we:
• Predict our output – Forward pass
• Calculate our error using our cost function = prediction –
ground truth
• Calculate the error derivative in reference for our weights.
• Update each weight with small Δ opposite to the gradient
direction (minimize) – Backward pass
• Δ is called our learning rate
Our forward pass
𝑘=0
𝑛
𝑊𝑋
W1
W2
W3
X1
X2
X3
𝑠𝑖𝑔𝑚𝑜𝑖𝑑
• Add a method to our class, called
forward
• This method will calculate 𝑌 our model
predicted output
• Use the following naming conventions:
• 𝑍(𝑛)
=𝑊(𝑛−1)
𝑋
• 𝑎(𝑛)
=𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑍(𝑛)
)=𝑓𝑎(𝑍(𝑛)
)
• Forward will return our model output,
AKA sigmoid of the last layer activation
sum - 𝑓𝑎(𝑍(3)
).
𝑍(2) 𝑎(2)
Lets add our cost function
• Add a method to our class, called cost
• This method will calculate our cost for
every iteration
• It gets as parameters our input and
output and returns the cost 𝐽 𝑌 =
1
2
(𝑌 − 𝑌)2
• Keep in mind 𝑌=forward(X)
Back propagation
Once we have the cost (error) we want to calculate the derivative of each weight
in reference to the cost.
Remember – we want to move each weight in the opposite of the direction to its
error.
We want to minimize 𝐽 𝑌 =
1
2
(𝑌 − 𝑌)2, where 𝑌 = 𝑓(𝑊𝑋)  𝐽 𝑊
=
1
2
(𝑌 − 𝑓(𝑊𝑋))2
So we need to calculate :
𝜕𝐽
𝜕𝑊
𝐽 𝑊
We have a composition of parameters, We need to use the chain rule
The chain rule
• https://en.wikipedia.org/wiki/Chain_rule
• A way to compute the derivative of function composition
We want to calculate
𝜕𝐽
𝜕𝑊
𝐽 𝑊 , lets do some chaining
𝜕𝐽
𝜕𝑊
𝐽 𝑊 =
𝜕𝐽
𝜕𝑊(1) 𝐽 𝑊 +
𝜕𝐽
𝜕𝑊(2) 𝐽 𝑊
Lets start with
𝜕𝐽
𝜕𝑊(2) 𝐽 𝑊 :
𝜕𝐽
𝜕𝑊(2)
1
2
(𝑌 − 𝑌)2 = 𝑌 − 𝑌
𝜕 𝑌
𝜕𝑊 2
𝑌= 𝑓𝑎( 𝑍(3)
),
𝜕 𝑌
𝜕𝑊 2 =
𝜕 𝑌
𝜕𝑍(3)
𝜕𝑍(3)
𝜕𝑊 2 we need to calculate sigmoid derivative
𝑍(3)
= 𝑎(2)
𝑊(2)
,
𝜕𝑍(3)
𝜕𝑊 2 = 𝑎(2)  Linear propagation of the error per weight.
Some derivatives
Code a function called sigmoidPrime , calculates sigmoid derivative for matix Z
Code a function called costPrime, calculates
𝜕𝐽
𝜕𝑊(1) ,
𝜕𝐽
𝜕𝑊(2)
Sigmoid derivative :
𝑑
𝑑𝑧
1
1+𝑒−𝑧 = 𝑒−𝑧
(1+𝑒−𝑧)2
Our derivatives functions
Utils.py contains methods you should add to your class, add them.
Time for some training
Go over the methods you have just added, can you tell what are they doing ?
In utils.py there is a class linear_trainer – what is it doing ?
In utils.py there is a method test_line – what is it doing ?
Create a network instance, training and test data and train your network using
the function test_line.
Lets run some more
Much better, now lets try 100K. mmm… same result Ideas ?
Default of training iterations number is 100, lets make it 10000
When error is too high
We usually tend to (in few minutes will get into the why):
• Do more training time
• Get more data
• Get bigger (deeper) model, usually comes with more data.
Do the following:
• Install scipy
• Increase your hidden layer to size 30
• Replace the linear trainer with BFGS_trainer inside the method test_line– find it in
utils.py
Optimization
Gradient decent looks for minimum and can suffer
from these problems:
• Stuck in local minima
• Stuck in plateau
• Learning rate is too big to reach minima,
bouncing…
Read more @ http://sebastianruder.com/optimizing-
gradient-descent/
The Bias Variance tradeoff
We can look at our model error as follows:
noise
model
error
Total
Error
Our error usually comes from combination of these two, These are all equivalent:
• High variance=modeling noise=not enough data=model too big=overfit
• High bias =model too simple=underfit
Bias / Variance
0
20
40
60
80
100
120
140
160
0 5 10 15
Good model
0
20
40
60
80
100
120
140
160
0 5 10 15
High bias
0
20
40
60
80
100
120
140
160
0 5 10 15
High Variance
How can you tell which one of
those do you have ?
Rules of thumb regarding Bias/Variance
• Good accuracy on training and test  Good model
• Good accuracy on training, poor on test  Overfit
• Poor on both  underfit
Put back our training iterations on 1000, still BGFS
Generate the training data as before range 0-1 but test data on range 0-2
How does it look ? Can you guess why ?
Deep neural nets have limits
• On general the network is trained on bounded data
• It is likely not to generalize well out of bound
• So you need your data sets contain all data range OR
• Have more suitable model, for curve
prediction(time series) RNN might have been a
better choice here.
• Try to fit the following curve 2.5𝑒
−𝑥
2 cos(𝜋𝑥) , range 0-
10
• Install tensorflow (cpu version, make sure you are on
python 3.5.x)
• Install keras
• Create our model using Keras(google time…):
• Sequential model
• Dens layers are the sum
• Use relu activation
Next steps if you wish to get deeper
• CNN and caffee: http://adilmoujahid.com/posts/2016/06/introduction-deep-
learning-python-caffe/
• Udacity deep learning course (TF examples walkthrough)
• Andrew NG ML course on courser
• Geof Hinton neural netwroks on coursera
• Stanford CS231 on youtube
• RNN: http://karpathy.github.io/2015/05/21/rnn-effectiveness/
• Quick round tour on different networks , great youtube series:
• https://www.youtube.com/playlist?list=PLjJh1vlSEYgvGod9wWiydumYl8hOXixNu
eran.shlomo@intel.com

Contenu connexe

Tendances

Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlowBarbara Fusinska
 
GDG-Shanghai 2017 TensorFlow Summit Recap
GDG-Shanghai 2017 TensorFlow Summit RecapGDG-Shanghai 2017 TensorFlow Summit Recap
GDG-Shanghai 2017 TensorFlow Summit RecapJiang Jun
 
(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305Amazon Web Services
 
Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetDistributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetAmazon Web Services
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016MLconf
 
Build a simple image recognition system with tensor flow
Build a simple image recognition system with tensor flowBuild a simple image recognition system with tensor flow
Build a simple image recognition system with tensor flowDebasisMohanty37
 
AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)
AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)
AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)Amazon Web Services
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf
 
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016MLconf
 
Introduction To TensorFlow
Introduction To TensorFlowIntroduction To TensorFlow
Introduction To TensorFlowSpotle.ai
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksDatabricks
 
Introduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep LearningIntroduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep LearningMadhu Sanjeevi (Mady)
 
Networks are like onions: Practical Deep Learning with TensorFlow
Networks are like onions: Practical Deep Learning with TensorFlowNetworks are like onions: Practical Deep Learning with TensorFlow
Networks are like onions: Practical Deep Learning with TensorFlowBarbara Fusinska
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016MLconf
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...MLconf
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016MLconf
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016MLconf
 
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...MLconf
 

Tendances (20)

Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
 
GDG-Shanghai 2017 TensorFlow Summit Recap
GDG-Shanghai 2017 TensorFlow Summit RecapGDG-Shanghai 2017 TensorFlow Summit Recap
GDG-Shanghai 2017 TensorFlow Summit Recap
 
Tensorflow
TensorflowTensorflow
Tensorflow
 
(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305
 
Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetDistributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNet
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
 
Build a simple image recognition system with tensor flow
Build a simple image recognition system with tensor flowBuild a simple image recognition system with tensor flow
Build a simple image recognition system with tensor flow
 
AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)
AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)
AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
 
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
 
Introduction To TensorFlow
Introduction To TensorFlowIntroduction To TensorFlow
Introduction To TensorFlow
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural Networks
 
Deep learning
Deep learningDeep learning
Deep learning
 
Introduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep LearningIntroduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep Learning
 
Networks are like onions: Practical Deep Learning with TensorFlow
Networks are like onions: Practical Deep Learning with TensorFlowNetworks are like onions: Practical Deep Learning with TensorFlow
Networks are like onions: Practical Deep Learning with TensorFlow
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
 
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
 

Similaire à Deep learning from scratch

Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionTe-Yen Liu
 
08 neural networks
08 neural networks08 neural networks
08 neural networksankit_ppt
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptxssuserf07225
 
Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sVidyasagar Bhargava
 
Deep Style: Using Variational Auto-encoders for Image Generation
Deep Style: Using Variational Auto-encoders for Image GenerationDeep Style: Using Variational Auto-encoders for Image Generation
Deep Style: Using Variational Auto-encoders for Image GenerationTJ Torres
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopHéloïse Nonne
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer FarooquiDatabricks
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
Machine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMachine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMohsin Ul Haq
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networksAkash Goel
 
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)Tech in Asia ID
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networksarjitkantgupta
 
19 - Neural Networks I.pptx
19 - Neural Networks I.pptx19 - Neural Networks I.pptx
19 - Neural Networks I.pptxEmanAl15
 

Similaire à Deep learning from scratch (20)

Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 
Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner's
 
Deep Style: Using Variational Auto-encoders for Image Generation
Deep Style: Using Variational Auto-encoders for Image GenerationDeep Style: Using Variational Auto-encoders for Image Generation
Deep Style: Using Variational Auto-encoders for Image Generation
 
Neural Network Part-2
Neural Network Part-2Neural Network Part-2
Neural Network Part-2
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and Hadoop
 
supervised.pptx
supervised.pptxsupervised.pptx
supervised.pptx
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Machine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMachine Learning using Support Vector Machine
Machine Learning using Support Vector Machine
 
Practical ML
Practical MLPractical ML
Practical ML
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
 
19 - Neural Networks I.pptx
19 - Neural Networks I.pptx19 - Neural Networks I.pptx
19 - Neural Networks I.pptx
 

Plus de Eran Shlomo

The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 Eran Shlomo
 
Intel and Amazon - Powering your innovation together.
Intel and Amazon - Powering your innovation together. Intel and Amazon - Powering your innovation together.
Intel and Amazon - Powering your innovation together. Eran Shlomo
 
Industrial internet of things
Industrial internet of thingsIndustrial internet of things
Industrial internet of thingsEran Shlomo
 
PyCourse - Self driving python course
PyCourse - Self driving python coursePyCourse - Self driving python course
PyCourse - Self driving python courseEran Shlomo
 
Imagine. Capture. Create. Interact
Imagine. Capture.Create. InteractImagine. Capture.Create. Interact
Imagine. Capture. Create. InteractEran Shlomo
 
Python - The Good, The Bad and The ugly
Python - The Good, The Bad and The ugly Python - The Good, The Bad and The ugly
Python - The Good, The Bad and The ugly Eran Shlomo
 
Internet of things - 2016 trends.
Internet of things - 2016 trends. Internet of things - 2016 trends.
Internet of things - 2016 trends. Eran Shlomo
 

Plus de Eran Shlomo (7)

The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 The deep learning tour - Q1 2017
The deep learning tour - Q1 2017
 
Intel and Amazon - Powering your innovation together.
Intel and Amazon - Powering your innovation together. Intel and Amazon - Powering your innovation together.
Intel and Amazon - Powering your innovation together.
 
Industrial internet of things
Industrial internet of thingsIndustrial internet of things
Industrial internet of things
 
PyCourse - Self driving python course
PyCourse - Self driving python coursePyCourse - Self driving python course
PyCourse - Self driving python course
 
Imagine. Capture. Create. Interact
Imagine. Capture.Create. InteractImagine. Capture.Create. Interact
Imagine. Capture. Create. Interact
 
Python - The Good, The Bad and The ugly
Python - The Good, The Bad and The ugly Python - The Good, The Bad and The ugly
Python - The Good, The Bad and The ugly
 
Internet of things - 2016 trends.
Internet of things - 2016 trends. Internet of things - 2016 trends.
Internet of things - 2016 trends.
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Dernier (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Deep learning from scratch

  • 1. Eran Shlomo, IPP tech lead, Haifa eran.shlomo@intel.com ©
  • 2. About me Haifa IoT Ignition lab and IPP(Intel ingenuity partnership program) tech lead. Intel Perceptual computing. Compute, cloud and embedded expert. Maker and Entrepreneur Focus on Data science and Machine learning in recent years
  • 3. Agenda Lets talk some theory Lets define a problem Time to code our network Meet the pro tools Time to fancier netwroks.
  • 4. By the end of the day… You will have: • Get some intuition on what is DL, what you can use it for. • Have understanding of the mechanics behind deep learning. • Get basic feeling of the concepts and how does DL works. • Some hands on known tools. • A list of pointers to continue your learning and experimenting. You will not have: • Practical experience on solving problems using DL • Understanding of the different types of networks and their usage • The math skills required to be an expert.
  • 5. We are going to work in try, catch up • Along the way we have exercises, you will get time to try them • Usually next slide will contain the solution • So for every task: • Try • Catch up once solution is on board, focus on understanding the solution • Make sure you have it working, each step is required for the one after it.
  • 6. Buzzwords alignment attempt AI Machine learning Supervised learning Deep learning Machine reasoning Automated tasks Train based on data Neural networks input logic output input output logic
  • 7. Assembly C (compiler) C++(OOP) JAVA(managed) Python (run time) Where we are in technology timeline perspective Model protos High level (keras) ???? ???? ????
  • 8. Deep learning – basic anatomy Data driven Training a model Input, output and hidden neurons Input layer Hidden layer(s) Output layer Deep learning Many hidden (deep) layers
  • 9. The essence of deeplearning Xi YiWij(1) Wij(2) W11(1) X1 Y1 W11(2) 𝑌 = 𝑓 𝑋 = 𝑊𝑋+b Deep network is essentially a function we train to detect some pattern b (bias) is omitted in this drawing Why is the sudden success ? A lot of data A lot of compute Improved networks
  • 10. Before we start , some Math … • As data science is becoming part of every business math gaining extra popularity. • Question you need to ask yourself if you wish to go deeper into the field – Do I Want/Can refresh/increase my math skills. • For our basic deeplearning course we need some: • Algebra, Mainly around Matrix/Vector operations • Calculus, Mainly around derivatives • You can never get enough of statistics in data science, go over variance, mean, distributions, probabilities • Python
  • 11. Some math references to get start with https://www.youtube.com/watch?v=K5BLNZw7UeU Matrix operations https://www.youtube.com/watch?v=kuixY2bCc_0 Multiplying matrices https://www.youtube.com/watch?v=rAof9Ld5sOg Derivatives https://www.youtube.com/watch?v=TUJgZ4UDY2g The chain rule https://www.youtube.com/watch?v=ZkjP5RJLQF4 Linear regression https://www.youtube.com/watch?v=_Po-xZJflPM Logistic regression https://www.youtube.com/watch?v=Y4lTTHua0TE Mean, Variance,…
  • 12. Lets practice some basics • We are working with python 3.5 . you are encouraged to work with conda package manager but pip is oke as well. • numpy is THE math operations package for python, we will be using it to play with matrices, install it. • Lets create 5 random normal numbers, making sure numpy is good to go • Visualization is very important in general and in the course, install matplotlib • Visualize 100 random numbers like the example above.
  • 13. Normal random We got 100 normally distributed numbers, lets create a histogram of them By default our normal distribution is with mean 0 and variance of 1. Create 2 matrices of 10x10: A = N(0,1), B =N(3,16) plot their histogram Hint: Flatten the matrix to histogram Type equation here. X~N(M, σ2 ) Variancemean
  • 14. Normal random You can switch between the standard normal to any normal, Z=N(0, 12) Code a function that multiply two matrices explicitly, you can assume inputs are nd array, don’t use numpy matrix multiplication operator. def mul_matrix(a,b): pass Hint: for loop is going over rows in numpy, matrix.T is the transpose
  • 15. Matrix multiplication Validate your code using numpy Hint: numpy.dot
  • 16. Neural networks – Background and inspiration It is pretty common to compare neural networks to how our brain works: • Coupled well with the term AI • Has some sense in it, as many different researches show. Yet we are a bit long from really understanding how the brain works. 𝑘=0 𝑛 𝑊𝑋 W1 W2 W3 X1 X2 X3 𝑓(𝑥)
  • 17. Artificial neural networks Output=f( 𝑘=0 𝑛 𝑊𝑋), where : WX – inputs multiplied by weights F(x) is an activation function Common activation functions: sigmoid, relu, tanh, linear, … 1. Code the sigmoid function Use numpy, z can be a vector def sigmoid(z): …2. Plot 100 points of your sigmoid. Hint: plt.plot(X, Y,'co') draw points only
  • 18. Synthetic problem for our neural network One of the biggest challenges with deeplearning is data, and a lot of it. We will override this problem using synthetic problem, we will model predefined function. We usually divide our dataset to (at least) two groups: • Training, ~70% of our data. • Test, 30% of our data. In real life cases you are likely to have validation as well. Our problem is to predict some function behavior given points from this function, later comparing our model into new generated points from the function. Lets example
  • 19. Utils.py You got utils.py module, this module contains some help functions. Git repo https://gitlab.com/eshlomo/EazyDnn , utils.py under base_network Lets generate a pattern: • Signature: • Function is given with range, number of samples and generator, which is the function itself. • Generate 100 samples of 𝑓 𝑥 = 𝑥2 between -1 to 1 • Plot the generated function
  • 20. Before we go into our network • This is a lot of info to push into short time. • You are likely not be able to follow all details in real time, In order to fully feel you got it you need to train and train and … • We are going through these details to give you a solid base for self learning. • Don’t get too alarmed from the math. • Feel free to contact me 
  • 21. Lets model a line 𝑓 𝑥 = 𝑥2Previous ex. solution We will create the following network 𝑥 𝑓 𝑥 𝑓 𝑥 = 𝑥 • Generate 100 samples of 𝑓 𝑥 = 𝑥, between 0 to 1 • Plot the generated function 𝑤11 (1) 𝑤12 (1) 𝑤13 (1) 𝑤 1 𝑤 2 𝑤11 (2) 𝑤21 (2)
  • 22. Network basics We are going to train our network to estimate our line, in other words: • We are going to find the best vectors w1,w2 that will make our network model our line. • Training is an iterative process in which in every iteration we minimize our model error be small change to our weight vectors. • For that purpose we define a cost (error) function on our model. Previous ex. solution 𝑓 𝑥 = 𝑥
  • 23. Cost function • Lets mark our model output as 𝑌, and our real output as 𝑌 • We use Quadratic cost (marked with J), Also known as mean squared error, maximum likelihood, and sum squared error :Err𝑜𝑟 = 𝐽 𝑌 = 1 2 (𝑌 − 𝑌)2 • Keep in mind we know the real output, we have training data==ground truth== annotated data. • Since we want to minimize our error, we would like to move our weights against the derivative at any given iteration. • Similar to finding minimum of function in calculus, only in numeric way, This process is called gradient decent.
  • 24. Gradient decent Lets start creating our network, create a class that will manage our network: • 3 layers, their size is constructor parameter. • 2 weight matrices • Init all weights with standard random normal A process in which every iteration we: • Predict our output – Forward pass • Calculate our error using our cost function = prediction – ground truth • Calculate the error derivative in reference for our weights. • Update each weight with small Δ opposite to the gradient direction (minimize) – Backward pass • Δ is called our learning rate
  • 25. Our forward pass 𝑘=0 𝑛 𝑊𝑋 W1 W2 W3 X1 X2 X3 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 • Add a method to our class, called forward • This method will calculate 𝑌 our model predicted output • Use the following naming conventions: • 𝑍(𝑛) =𝑊(𝑛−1) 𝑋 • 𝑎(𝑛) =𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑍(𝑛) )=𝑓𝑎(𝑍(𝑛) ) • Forward will return our model output, AKA sigmoid of the last layer activation sum - 𝑓𝑎(𝑍(3) ). 𝑍(2) 𝑎(2)
  • 26. Lets add our cost function • Add a method to our class, called cost • This method will calculate our cost for every iteration • It gets as parameters our input and output and returns the cost 𝐽 𝑌 = 1 2 (𝑌 − 𝑌)2 • Keep in mind 𝑌=forward(X)
  • 27. Back propagation Once we have the cost (error) we want to calculate the derivative of each weight in reference to the cost. Remember – we want to move each weight in the opposite of the direction to its error. We want to minimize 𝐽 𝑌 = 1 2 (𝑌 − 𝑌)2, where 𝑌 = 𝑓(𝑊𝑋)  𝐽 𝑊 = 1 2 (𝑌 − 𝑓(𝑊𝑋))2 So we need to calculate : 𝜕𝐽 𝜕𝑊 𝐽 𝑊 We have a composition of parameters, We need to use the chain rule
  • 28. The chain rule • https://en.wikipedia.org/wiki/Chain_rule • A way to compute the derivative of function composition We want to calculate 𝜕𝐽 𝜕𝑊 𝐽 𝑊 , lets do some chaining 𝜕𝐽 𝜕𝑊 𝐽 𝑊 = 𝜕𝐽 𝜕𝑊(1) 𝐽 𝑊 + 𝜕𝐽 𝜕𝑊(2) 𝐽 𝑊 Lets start with 𝜕𝐽 𝜕𝑊(2) 𝐽 𝑊 : 𝜕𝐽 𝜕𝑊(2) 1 2 (𝑌 − 𝑌)2 = 𝑌 − 𝑌 𝜕 𝑌 𝜕𝑊 2 𝑌= 𝑓𝑎( 𝑍(3) ), 𝜕 𝑌 𝜕𝑊 2 = 𝜕 𝑌 𝜕𝑍(3) 𝜕𝑍(3) 𝜕𝑊 2 we need to calculate sigmoid derivative 𝑍(3) = 𝑎(2) 𝑊(2) , 𝜕𝑍(3) 𝜕𝑊 2 = 𝑎(2)  Linear propagation of the error per weight.
  • 29. Some derivatives Code a function called sigmoidPrime , calculates sigmoid derivative for matix Z Code a function called costPrime, calculates 𝜕𝐽 𝜕𝑊(1) , 𝜕𝐽 𝜕𝑊(2) Sigmoid derivative : 𝑑 𝑑𝑧 1 1+𝑒−𝑧 = 𝑒−𝑧 (1+𝑒−𝑧)2
  • 30. Our derivatives functions Utils.py contains methods you should add to your class, add them.
  • 31. Time for some training Go over the methods you have just added, can you tell what are they doing ? In utils.py there is a class linear_trainer – what is it doing ? In utils.py there is a method test_line – what is it doing ? Create a network instance, training and test data and train your network using the function test_line.
  • 32. Lets run some more Much better, now lets try 100K. mmm… same result Ideas ? Default of training iterations number is 100, lets make it 10000
  • 33. When error is too high We usually tend to (in few minutes will get into the why): • Do more training time • Get more data • Get bigger (deeper) model, usually comes with more data. Do the following: • Install scipy • Increase your hidden layer to size 30 • Replace the linear trainer with BFGS_trainer inside the method test_line– find it in utils.py
  • 34. Optimization Gradient decent looks for minimum and can suffer from these problems: • Stuck in local minima • Stuck in plateau • Learning rate is too big to reach minima, bouncing… Read more @ http://sebastianruder.com/optimizing- gradient-descent/
  • 35. The Bias Variance tradeoff We can look at our model error as follows: noise model error Total Error Our error usually comes from combination of these two, These are all equivalent: • High variance=modeling noise=not enough data=model too big=overfit • High bias =model too simple=underfit
  • 36. Bias / Variance 0 20 40 60 80 100 120 140 160 0 5 10 15 Good model 0 20 40 60 80 100 120 140 160 0 5 10 15 High bias 0 20 40 60 80 100 120 140 160 0 5 10 15 High Variance How can you tell which one of those do you have ?
  • 37. Rules of thumb regarding Bias/Variance • Good accuracy on training and test  Good model • Good accuracy on training, poor on test  Overfit • Poor on both  underfit Put back our training iterations on 1000, still BGFS Generate the training data as before range 0-1 but test data on range 0-2 How does it look ? Can you guess why ?
  • 38. Deep neural nets have limits • On general the network is trained on bounded data • It is likely not to generalize well out of bound • So you need your data sets contain all data range OR • Have more suitable model, for curve prediction(time series) RNN might have been a better choice here. • Try to fit the following curve 2.5𝑒 −𝑥 2 cos(𝜋𝑥) , range 0- 10 • Install tensorflow (cpu version, make sure you are on python 3.5.x) • Install keras • Create our model using Keras(google time…): • Sequential model • Dens layers are the sum • Use relu activation
  • 39. Next steps if you wish to get deeper • CNN and caffee: http://adilmoujahid.com/posts/2016/06/introduction-deep- learning-python-caffe/ • Udacity deep learning course (TF examples walkthrough) • Andrew NG ML course on courser • Geof Hinton neural netwroks on coursera • Stanford CS231 on youtube • RNN: http://karpathy.github.io/2015/05/21/rnn-effectiveness/ • Quick round tour on different networks , great youtube series: • https://www.youtube.com/playlist?list=PLjJh1vlSEYgvGod9wWiydumYl8hOXixNu

Notes de l'éditeur

  1. Smart Home Industry 4.0 Retail Autonomous cars Robotics Medical FinTech Cognitive computing 5G Wearables