SlideShare une entreprise Scribd logo
1  sur  62
Deep Learning
Pierre de Lacaze
rpl@lispnyc.org
Lisp NYC
Tuesday, June 20th, 2017
Jane Street Capital
Overview
Principal Topics
1. Convolutional Neural Networks (CNNs)
2. Recurrent Neural Networks (RNNs)
Time permitting…
1. Generative Adversarial Networks (GANs)
2. Differentiable Neural Computers (DNCs)
3. Deep Reinforcement Learning (DRL)
Deep Neural Networks
• A deep neural network is a neural network with multiple
layers of hidden units.
– E.g. MLPs: Multi-Layered Perceptrons (MLPs)
• Convolutional Neural Nets (CNNs)
– Biologically-inspired variants of MLPs
– Successfully used in image recognition, speech recognition
• Recurrent Neural Nets (RNN)
– Cyclic graphs where next layers feeds into previous layers
– Allow for a window of time into past data
– Successfully used or Natural Language processing.
Application: Combining CNNs & RNNs
GENERATING IMAGE DESCRIPTIONS
Together with convolutional Neural Networks, RNNs have been used as part of a model to generate
descriptions for unlabeled images. It’s quite amazing how well this seems to work. The combined model even
aligns the generated words with features found in the images.
Deep Visual-Semantic Alignments for Generating Image Descriptions. Source: http://cs.stanford.edu/people/karpathy/deepimagesent
Part 0
ANN Review &
Multi-Layered Perceptrons
(MLPs)
Multi Layered Perceptrons (MLPs) are fully
connected feed forward networks with several
layers of hidden units.
Linear Units and Perceptrons
• Linear Unit: A linear combination of weighted inputs (real-valued)
• Perceptron: Thresholded Linear Unit (discrete-valued)
Note: w0 is a bias whose purpose is to move the threshold of the activation function.
Multi Layered Perceptrons
• These are fully connected Deep Feed Forward Networks
• Every output from previous layer is connected to every unit in the next layer
• They are typically trained using the Backprogation Algorithm
• Backprogation is effectively Gradient Descent applied to every unit in the network.
Image Credit: Michael Bernstein, Neural Networks and Deep Learning, Chapter 2.
Gradient Descent Motivation
Weight Space Error Surface
ANN Backpropagation Algorithm
(Using incremental gradient descent)
1. Initial weights to small random numbers
2. Until termination criteria for each training example
a. Compute the network outputs for the training example
b. For each output unit k compute its error:
δk = ok (1 – ok) (tk – ok)
c. For each hidden unit h compute its error:
δh = oh (1 – oh) Σ (whk δk )
k
d. Update each network weight wij
wij = wij + η δh xij
Thoughtful Reminder Slide
Show Code
Examples
Identity Function Example
• Tom Mitchell, Machine Learning, Chpt 4., 1st edition.
(def if-td
[[[1 0 0 0 0 0 0 0] [1 0 0 0 0 0 0 0]]
[[0 1 0 0 0 0 0 0] [0 1 0 0 0 0 0 0]]
[[0 0 1 0 0 0 0 0] [0 0 1 0 0 0 0 0]]
[[0 0 0 1 0 0 0 0] [0 0 0 1 0 0 0 0]]
[[0 0 0 0 1 0 0 0] [0 0 0 0 1 0 0 0]]
[[0 0 0 0 0 1 0 0] [0 0 0 0 0 1 0 0]]
[[0 0 0 0 0 0 1 0] [0 0 0 0 0 0 1 0]]
[[0 0 0 0 0 0 0 1] [0 0 0 0 0 0 0 1]]])
• Ran 3 examples of MLPs on Identity function.
– A 1 hidden layer MLP: 8 x 3 x 8
– A 2 hidden layer MLP: 8 x 3 x 3 x 8
– A 3 hidden layer MLP: 8 x 3 x 3 x 3 x 8
MLP Training Comparisons
❶ MLP with 1 hidden layer of 3 hidden units: 4,500 iterations to converge
❷ MLP with 2 hidden layers of 3 hidden units: 28,000 iteration to converge
❸ MLP with 3 hidden layers of 3 hidden units: 1,000,000+ iterations to converge
Part 1
Convolutional Neural Nets
(CNNs)
Convolutional Neural Networks are
biologically-inspired variants of Multi Layered
Perceptrons (MLPs)
History of CNNs
• Research dates back to the 1970’s
• Seminal Paper on CNNs:
– Gradient-based learning applied to document recognition,
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner, 1998
• Really took off in 2012
– ILSVRC (ImageNet Large-Scale Visual Recognition Challenge)
– 2012 ILSBRC: AlexNet , Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton
– 2013 ILBSRC: ZF Net, Matthew Zeiler and Rob Fergus , NYU
– 2014: VGG Net, Karen Simonyan and Andrew Zisserman, University of Oxford
CNN Overview
• A CNN typically consists of one or more convolutional and sampling layers
followed by one or more fully connected layers.
• Specifically designed to exploit 2D input such an image or speech input
• Faster to to train than fully connected networks.
• Sparse Connectivity
– CNNs exploit spatially-local correlation using local connectivity pattern between units of adjacent layers.
– These are called local receptive fields
• Shared Weights
– Replicated units share the same parameterization (weight vector and bias) and form a feature map.
• Max Pooling
– A form of non-linear down-sampling. Max pooling partitions the input image into a set of non-overlapping
rectangles and, for each such sub-region, outputs the maximum value.
Local Receptive Fields
• In a fully connected network, every input in the input layer is connected to
every hidden unit.
• This prevents the network from learning spatial features of the image.
• The idea is to map (connect) small rectangular sections of the image (inputs)
to different hidden units.
• These hidden units are called local receptive fields and result in a sparse
connectivity between the input layer and the first hidden layer.
• The stride length is the amount by which we shift the rectangular sections.
Typically use rectangular sections shifted over 1 pixel
• Different sets of local receptive fields form feature maps each of which
represent a potentially different feature.
Feature Maps
• Each hidden unit shares the same set of weights and bias but
for a different spatial area of the input.
• This allows that layer to learn the same feature but for
different regions of the image.
• The complete hidden layer will in fact consist of several
feature maps. This is called a convolutional layer.
• The shared bias and weights in each feature map are often
called filters or kernels.
How Feature Maps Work
The amount by which
the local receptive field
is shifted is called the
stride length.
A stride length of 1 is
common.
All hidden units in a
feature map share the
same weights and bias.
This greatly reduces the
number of parameters in
a layer.
Image credit: Michael Nielsen’s Neural Networks and Deep Learning, Chapter 6.
Why Do Feature Maps Learn Different Features?
• From Quora: Andy Thomas
• Two reasons:
– The weights of the filters are randomly initialized
– Different feature maps reduce the cost function
• Random initialization of the weights will likely ensure each filter
converges to different local minima in the cost function. It is very
unlikely that each filter would begin to resemble other filters, as
that would almost certainly result in an increase of the cost
function and therefore no gradient descent algorithm would head
in that direction.
• Some feature maps may learn the same feature.
The Convolution Operator
• A Convolution is a simple mathematical operation common to many image
processing operators.
• Provides a way of “multiplying” two arrays of numbers of different sizes
but same dimensionality
• Input image has M rows and N columns, and the kernel has m rows and n
columns,
• The output image will have M - m + 1 rows, and N - n + 1 columns.
• The purpose of Convolution in a CNN is to extract features from the input image.
• Convolution preserves the spatial relationship between pixels by learning image
features using small squares of input data
Output of the Convolutional Layer
• For each hidden unit in each feature map, only take
into account pixels in the local receptive field (sparse
connectivity)
• For each feature map, for the jth ,kth hidden unit in
that feature map, assuming a 5x5 filter (aka kernel),
the output of that unit is given by:
–σ (b + ∑ l=0,4 ∑ m=0,4 wl,m a j+l,k+m)
Pooling
• A pooling layer typically follows a convolutional layer.
• Intuitively it is a down sampling of the previous layer.
• Max pooling is technique that selects the maximum
activation from a set of units from the convolutional
layer.
• Effectively take each feature map from convolutional
layer and produce a reduced feature map.
• Other pooling techniques:
– L2 Pooling
• Takes the square root of the sum of the squares of a set of units
How Pooling Works
• Pooling is a form of statistical aggregation or downsampling of the previous layer.
• Pooling layers do not learn anything
• While it is common, it is not required to have a pooling after a convolutional layer
Image Credit: Michael Nielsen, Neural Networks and Deep Learning, Chapter 6
Backpropagation in CNNs Overview
• Applying backprogation to a convolutional layer is very similar to
applying backprogation to a fully connected except that errors and
gradients are computed separately for each filter.
• Applying backpropagation to a pooling layer involves using an
upsampling function which propagates the error over the sampling
function using its derivatives.
• Backpropagation for a fully connected layer is exactly the same as
for MLPs.
• Yoshua Bengio on Quora: “There is a general recipe for obtaining a
back-propagation algorithm associated with ANY computational
graph. You can find it described in my book, for example, in the
feedforward nets (mlp) chapter (6): DEEP LEARNING”
Backpropagation in CNNs
• Error and gradient for fully connected layers
• Error and gradient for convolutional layer
• k indexes the filter number and upsample propagates error through pooling layer)
Slides from Hiroshi Kuwajima (visiting scholar at Stanford)
MNIST Data Set
• National Institute for Standards and Technology (NIST)
• Modified NIST Data Set maintained by Yan LeCun
• MNIST Data in CSV format
A Simple Architecture for MNIST
Image Credit: Michael Bernstein, Neural Networks and Deep Learning, Chapter 6.
• Input layer: 764 inputs encode the MNIST image
• Convolutional layer: 1728 units representing 3 feature maps
• Max-Pooling layer: 432 units representing 3 feature maps
• Output layer: 10 units, one for each digit MNIST dataset
Shared Weights and Training CNNs
• CNN
– 28×28 = 784 input neurons
– 20 feature maps 20×26=520
– Total of 520 weights to learn.
• MLP
– 784=28×28 inputs,
– 30 hidden units,
– Total of 784×30 weights = 23520
– Total of 30 biases,
– Total of 23,550 weights to learn.
• A single fully-connected layer would have more than 40 times as
many weights as the convolutional layer.
A CNN Architecture for MNIST
Image Credit: Michael Nielsen, Neural Networks and Deep Learning, Chapter 6
• 9,967 Test images correctly classified out 10,000
• Very similar to LeNet-5 architecture
• Softmax Regression aka Multi-class Logistic Regression is a generalization of
logistic regression that is used for multi-class classification and based of the
softmax function.
Incorrectly Classified MNIST Images
Of the 10,000 MNIST test images 9,967 correctly classified, 33 incorrectly classified
What features are learned?
• The images above show the type of features the convolutional learns.
• Lighter regions mean a smaller, typically negative weight,
• Darker region mean a larger weight
• Many of the features have distinguishable sub-regions of light and dark
• It’s clear that it’s learning “stuff” related to spatial structure
Performance Enhancements
• Regularization Terms to help with overfitting
– Regularization is technique that allows you to penalize
your loss function.
• Ensemble methods
– Train several nets and have them vote on the output.
• Generative expanded data sets
– Basically apply distortions to original data set
– E.g. 50,000 images  250,000 images
Expanded Generated Data Sets
Image credit: Tijmen Tieleman, University of Toronto
CNN Summary
• There are four main operations in a CNN:
– Convolution
– Non Linearity (ReLU)
– Pooling or Sub Sampling
– Classification (Fully Connected Layer)
• These operations are the basic building blocks of every CNN.
• CNN’s Faster to train than MLPs because fewer parameters need to be learned.
• Work well with two-dimensional data in which locality is meaningful,
– e.g. object recognition in images.
• CNN can also be used with higher dimensional data
– e.g. MRI Images
• Addition convolutional layers provide higher level features (meta features)
• Pooling layers progressively reduce the spatial size of the representation to reduce the amount of features and the
computational complexity of the network
• Fully Connected layer at the end provides the classifier
• Rectified Linear Units (ReLU) typically outperform networks based on sigmoid activation functions (sigmoid or
tanh).
Part 2
Recurrent Neural Nets
(RNNs)
Recurrent Neural Networks are a family of
Neural Networks for procession sequential data.
Recurrent Neural Nets Overview
• Leverage the ideas
– unfolding computational graphs
– parameter-sharing to abstract away input position
• “In 2009 I visited Nepal” vs “I visited Nepal in 2009”
• RNNs represent cyclical graphs so information flows in both directions through the
network.
– They are networks with loops in them, allowing information to persist.
• Different flavors of RNNs
– An output at each time-step and recurrent connections between hidden units
– An output at each time-step and recurrent connections only from output units
– An output only after the entire sequence is fed into the network and connections between
hidden units.
• RNNs can simulate a Turing Machine and can represent any computable function
– Siegelman and Sontag, 1995.
– Used an RNN off finite size consisting 886 units
RNNs in Practice
• Types of RNN used in Practice
– Vanilla RNNs
– Bidirectional RNNs
– Deep Bidirectional RNNs
– Long Short-Term Memory (LSTM)
• Practical Applications of RNNs
– Language Modeling And Generating Text
– Machine Translation
– Speech Recognition
– Generating Image Descriptions
Computational Graphs
• Computational Graph: Formalization of the
structure of a set of computations.
• Unfolding a recursive computation into a
graph with repetitive structure results in
parameter sharing across a deep network
structure.
• Any function involving a recurrence is an RNN
• Hidden Units in RNN:
– h(t) = f(h(t-1), x(t), θ)
– Notice that θ is the same at each time step.
Unfolding an RNN
Training RNNs
• Backpropagation in Computational Graphs
– Backprogation can be derived for any computational graph by recursively applying the chain
rule. (Deep Learning, Chapter 6)
– The backprogation algorithm consists of performing a Jacobian-gradient-product for each
operation in the graph
– In vector calculus, the Jacobian matrix is the matrix of all first-order partial derivatives of a
vector-valued function
• Backpropagation Through Time (BPTT).
– Gradient at each output depends not only on the calculations of the current time step, but
also the previous time steps.
– Vanilla RNNs trained with BPTT have difficulties learning long-term dependencies, i.e.
dependencies between (words) steps that are far apart)
• “I grew up in France… I speak fluent French”
– Suffers from vanishing/exploding gradient problem.
• Vanishing gradient: your gradients get smaller and smaller in magnitude as you backpropagate through earlier
layers (or through time).
• Activation functions like the sigmoid function produce gradients in range [-1,1] which easily causes the gradient
to vanish in earlier layers.
• Exploding gradient: more of an issue with recurrent networks, where the opposite happens due to a Jacobian
with determinant greater than 1.
– Certain types of RNNs (like LSTMs) were specifically designed to get around these problems.
Long Short Term Memory (LSTM)
• LSTMs are a special kind of RNN, capable of learning long-term dependencies.
• Successful in handwriting recognition, speech recognition, image captioning and machine
translation
• Type of gated network
• Introduced by Hochreiter & Schmidhuber (1997)
– Added self-loops which allowed gradient to flow for long durations.
– Weight on the self-loop based on context rather than fixed. (Gers et al., 2000)
– Based on the idea of creating paths through the network in which the gradient neither vanishes nor
explodes.
• Based on the idea of creating paths through the network in which the gradient neither vanishes nor
explodes.
• Leaky units allowed information to accumulate over a long duration
• LSTM’s generalize leaky units by allowing connection weights to change over time.
• LSTM’s allow the network to decide when to forget information.
• A single hidden unit in an LSTM is replaced with a recurrent network cell consisting of 4
components that interact with each other.
Gated Network Cells
• Gated network cells replace the hidden units of RNNs
• Input feature is computed using the ANN unit.
• The input can be accumulated if input gate allows it.
• The state has a self-loop controlled by the forget gate
• The output can be turned off by the output gate
28×28
LSTM in NLP Generation
Image credit: Google Research Blog
LSTM Summary
• A type of RNN architecture that addresses the
vanishing/exploding gradient problem.
• LSTM allow the learning of long-term
dependencies which is crucial for sequences
of inputs.
• Recently achieved state-of-the-art
performance in speech recognition, language
modeling, translation, image captioning
Additional Topics…
• Generalized Adversarial Networks (GANs)
• Deep Reinforcement Learning (DRL)
• Differentiable Neural Computers (DNCs)
Part 3
Generative Adversarial
Networks
(GANs)
Generative Adversarial Networks are an example of generative
models. GANs focus primarily on sample generation, though it is
possible to design GANs that can estimate the probability
distribution.
GAN Framework
• Based on the idea of a two player game
– Player 1: Generator
– Player 2: Discriminator
• The generator generates samples and tries to
fool the discriminator
• The discriminator determines if the generated
samples are real or fake
Why GANs are useful
• When predicting the next frame in a video, using the Mean Squared Error
(MSE) causes an averaging over many possible futures which causes the
ear to disappear and blurring of the eyes
• The adversarial version does a much better job preserving the ear and not
blurring the eyes.
Image credit: Ian Goodfellow, GANs Tutorial, NIPS 2016
GANs Summary
• GANs are generative models that use
supervised learning to approximate an
intractable cost function
• GANs requires finding Nash equilibria in high
dimensional, continuous, non-convex games.
• GANs are crucial to many different state of the
art image generation and manipulation
systems.
Part 4
Deep Reinforcement Learning
(DRL)
Deep Reinforcement Learning combines both Deep Learning and
Reinforcement Learning by using Deep Learning techniques to learn values
for the Q Function in Reinforcement Learning. This is described in Google
Deep Mind’s Atari paper and exemplified by the AlphaGo program
Deep Reinforcement Learning
• Combines Reinforcement Learning with Deep Learning
• A Form of model-free or unsupervised learning
• Uses Neural Nets to estimate Q Values.
• Very new field. No Wikipedia Page on this topic.
• Idea is to 3feed states and actions into the network to predict Q values.
• Neural networks are exceptionally good in coming up with good features
for highly structured data.
• This is the technology used by Google DeepMind’s AlphaGo program.
Reinforcement Learning Revisited
• Definitions
– Policy π is a way of selecting an action given a state
– Value function Qπ (s,a) is the expected total reward for
performing action a from state s given policy π
• Different Approaches
– Policy Based RL
• Search for the optimal policy in space of policies
– Value-based RL
• Estimate optimal value function Q*(s,a)
– Model-based RL
• Build a model of the environment and use look ahead
The Many States Problem
• In the Nature Deep Mind Atari paper:
• Take four last screen images, resize them to 84×84 and
convert then to gray scale with 256 gray levels.
• This yields 25684×84×4≈1067970 possible game states.
• This means 1067970 rows in our imaginary Q-table.
• That is more than the number of atoms in the known
universe!
Deep-Q Architecture
Deep Q-Learning Error & Gradient
• Represent Q function using a deep network.
• Error function
• Gradient
Strategies & Tricks
• Experience Relay
– During gameplay all the experiences <s,a,r,s′> are stored in a replay memory.
– When training the network, random samples from the replay memory are
used instead of the most recent transition.
– This breaks the similarity of subsequent training samples, which otherwise
might drive the network into a local minimum.
– Also experience replay makes the training task more similar to usual
supervised learning, which simplifies debugging and testing the algorithm.
– One could actually collect all those experiences from human gameplay and the
train network on these.
• Exploration-Exploitation
– ε-greedy exploration
– with probability ε choose a random action, otherwise go with the “greedy”
action with the highest Q-value.
Deep Q-Learning Algorithm
DeepMind Atari Deep-Q Network
References (1)
• Neural Nets & Deep Learning
– http://neuralnetworksanddeeplearning.com/chap2.html
– http://deeplearning.net/tutorial/deeplearning.pdf
• Convolutional Neural Networks
– http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf
– http://neuralnetworksanddeeplearning.com/chap6.html
– http://cs231n.github.io/convolutional-networks/
– Visualizing and Understanding Convolutional Networks
– Convolutional Neural Networks backpropagation: from intuition to derivation
– An Intuitive Explanation of Convolutional Neural Networks
– Backpropagation in Convolutional Neural Networks
• Recurrent Neural Nets
– http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf
– http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
– http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-2-implementing-a-language-
model-rnn-with-python-numpy-and-theano/
– http://www.wildml.com/2015/10/recurrent-neural-networks-tutorial-part-3-backpropagation-through-
time-and-vanishing-gradients/
– http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-4-implementing-a-grulstm-rnn-
with-python-and-theano/
References (2)
• Generative Adversarial Networks
– NIPS 2016 Tutorial: Generative Adversarial Networks
• Deep Reinforcement Learning
– http://www0.cs.ucl.ac.uk/staff/d.silver/web/Resources_files/deep_rl.pdf
– http://neuro.cs.ut.ee/demystifying-deep-reinforcement-learning/
• Differentiable Neural Computers
– https://deepmind.com/blog/differentiable-neural-computers/
• Google DeepMind DRL Atari Paper
– https://storage.googleapis.com/deepmind-media/alphago/AlphaGoNaturePaper.pdf
Questions
• Goodfellow quote on BP on Quora
• Vanishing / exploding gradient

Contenu connexe

Tendances

Intro to deep learning
Intro to deep learning Intro to deep learning
Intro to deep learning David Voyles
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksJinwon Lee
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkMojammilHusain
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningMohamed Loey
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Appsilon Data Science
 
CNN Machine learning DeepLearning
CNN Machine learning DeepLearningCNN Machine learning DeepLearning
CNN Machine learning DeepLearningAbhishek Sharma
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Suraj Aavula
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Sujit Pal
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOswald Campesato
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsKasun Chinthaka Piyarathna
 
ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)Sanjay Saha
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learningleopauly
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Jihong Kang
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 

Tendances (20)

Intro to deep learning
Intro to deep learning Intro to deep learning
Intro to deep learning
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Deep learning
Deep learningDeep learning
Deep learning
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
CNN Machine learning DeepLearning
CNN Machine learning DeepLearningCNN Machine learning DeepLearning
CNN Machine learning DeepLearning
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
 
ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Deep learning ppt
Deep learning pptDeep learning ppt
Deep learning ppt
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 

Similaire à Deep Learning

build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in PythonKv Sagar
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsShunta Saito
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxssuser3aa461
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspectiveAnirban Santara
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningJunaid Bhat
 
convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningssusere5ddd6
 
A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksA Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksRimzim Thube
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkFerdous ahmed
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learningStanley Wang
 
Lecture on Deep Learning
Lecture on Deep LearningLecture on Deep Learning
Lecture on Deep LearningYasas Senarath
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learningPoo Kuan Hoong
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryKenta Oono
 
Autoencoders for image_classification
Autoencoders for image_classificationAutoencoders for image_classification
Autoencoders for image_classificationCenk Bircanoğlu
 

Similaire à Deep Learning (20)

DL.pdf
DL.pdfDL.pdf
DL.pdf
 
build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Python
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
cnn.pdf
cnn.pdfcnn.pdf
cnn.pdf
 
Cnn
CnnCnn
Cnn
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learning
 
Mnist report
Mnist reportMnist report
Mnist report
 
A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksA Survey of Convolutional Neural Networks
A Survey of Convolutional Neural Networks
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
 
Deep learning (2)
Deep learning (2)Deep learning (2)
Deep learning (2)
 
Mnist report ppt
Mnist report pptMnist report ppt
Mnist report ppt
 
Lecture on Deep Learning
Lecture on Deep LearningLecture on Deep Learning
Lecture on Deep Learning
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learning
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
Autoencoders for image_classification
Autoencoders for image_classificationAutoencoders for image_classification
Autoencoders for image_classification
 

Plus de Pierre de Lacaze

Babar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationBabar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationPierre de Lacaze
 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsPierre de Lacaze
 

Plus de Pierre de Lacaze (7)

Babar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationBabar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and Representation
 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural Nets
 
Logic Programming and ILP
Logic Programming and ILPLogic Programming and ILP
Logic Programming and ILP
 
Meta Object Protocols
Meta Object ProtocolsMeta Object Protocols
Meta Object Protocols
 
Prolog 7-Languages
Prolog 7-LanguagesProlog 7-Languages
Prolog 7-Languages
 
Clojure 7-Languages
Clojure 7-LanguagesClojure 7-Languages
Clojure 7-Languages
 
Knowledge Extraction
Knowledge ExtractionKnowledge Extraction
Knowledge Extraction
 

Dernier

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 

Dernier (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 

Deep Learning

  • 1. Deep Learning Pierre de Lacaze rpl@lispnyc.org Lisp NYC Tuesday, June 20th, 2017 Jane Street Capital
  • 2. Overview Principal Topics 1. Convolutional Neural Networks (CNNs) 2. Recurrent Neural Networks (RNNs) Time permitting… 1. Generative Adversarial Networks (GANs) 2. Differentiable Neural Computers (DNCs) 3. Deep Reinforcement Learning (DRL)
  • 3. Deep Neural Networks • A deep neural network is a neural network with multiple layers of hidden units. – E.g. MLPs: Multi-Layered Perceptrons (MLPs) • Convolutional Neural Nets (CNNs) – Biologically-inspired variants of MLPs – Successfully used in image recognition, speech recognition • Recurrent Neural Nets (RNN) – Cyclic graphs where next layers feeds into previous layers – Allow for a window of time into past data – Successfully used or Natural Language processing.
  • 4. Application: Combining CNNs & RNNs GENERATING IMAGE DESCRIPTIONS Together with convolutional Neural Networks, RNNs have been used as part of a model to generate descriptions for unlabeled images. It’s quite amazing how well this seems to work. The combined model even aligns the generated words with features found in the images. Deep Visual-Semantic Alignments for Generating Image Descriptions. Source: http://cs.stanford.edu/people/karpathy/deepimagesent
  • 5. Part 0 ANN Review & Multi-Layered Perceptrons (MLPs) Multi Layered Perceptrons (MLPs) are fully connected feed forward networks with several layers of hidden units.
  • 6. Linear Units and Perceptrons • Linear Unit: A linear combination of weighted inputs (real-valued) • Perceptron: Thresholded Linear Unit (discrete-valued) Note: w0 is a bias whose purpose is to move the threshold of the activation function.
  • 7. Multi Layered Perceptrons • These are fully connected Deep Feed Forward Networks • Every output from previous layer is connected to every unit in the next layer • They are typically trained using the Backprogation Algorithm • Backprogation is effectively Gradient Descent applied to every unit in the network. Image Credit: Michael Bernstein, Neural Networks and Deep Learning, Chapter 2.
  • 9. ANN Backpropagation Algorithm (Using incremental gradient descent) 1. Initial weights to small random numbers 2. Until termination criteria for each training example a. Compute the network outputs for the training example b. For each output unit k compute its error: δk = ok (1 – ok) (tk – ok) c. For each hidden unit h compute its error: δh = oh (1 – oh) Σ (whk δk ) k d. Update each network weight wij wij = wij + η δh xij
  • 11. Identity Function Example • Tom Mitchell, Machine Learning, Chpt 4., 1st edition. (def if-td [[[1 0 0 0 0 0 0 0] [1 0 0 0 0 0 0 0]] [[0 1 0 0 0 0 0 0] [0 1 0 0 0 0 0 0]] [[0 0 1 0 0 0 0 0] [0 0 1 0 0 0 0 0]] [[0 0 0 1 0 0 0 0] [0 0 0 1 0 0 0 0]] [[0 0 0 0 1 0 0 0] [0 0 0 0 1 0 0 0]] [[0 0 0 0 0 1 0 0] [0 0 0 0 0 1 0 0]] [[0 0 0 0 0 0 1 0] [0 0 0 0 0 0 1 0]] [[0 0 0 0 0 0 0 1] [0 0 0 0 0 0 0 1]]]) • Ran 3 examples of MLPs on Identity function. – A 1 hidden layer MLP: 8 x 3 x 8 – A 2 hidden layer MLP: 8 x 3 x 3 x 8 – A 3 hidden layer MLP: 8 x 3 x 3 x 3 x 8
  • 12. MLP Training Comparisons ❶ MLP with 1 hidden layer of 3 hidden units: 4,500 iterations to converge ❷ MLP with 2 hidden layers of 3 hidden units: 28,000 iteration to converge ❸ MLP with 3 hidden layers of 3 hidden units: 1,000,000+ iterations to converge
  • 13. Part 1 Convolutional Neural Nets (CNNs) Convolutional Neural Networks are biologically-inspired variants of Multi Layered Perceptrons (MLPs)
  • 14. History of CNNs • Research dates back to the 1970’s • Seminal Paper on CNNs: – Gradient-based learning applied to document recognition, Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner, 1998 • Really took off in 2012 – ILSVRC (ImageNet Large-Scale Visual Recognition Challenge) – 2012 ILSBRC: AlexNet , Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton – 2013 ILBSRC: ZF Net, Matthew Zeiler and Rob Fergus , NYU – 2014: VGG Net, Karen Simonyan and Andrew Zisserman, University of Oxford
  • 15. CNN Overview • A CNN typically consists of one or more convolutional and sampling layers followed by one or more fully connected layers. • Specifically designed to exploit 2D input such an image or speech input • Faster to to train than fully connected networks. • Sparse Connectivity – CNNs exploit spatially-local correlation using local connectivity pattern between units of adjacent layers. – These are called local receptive fields • Shared Weights – Replicated units share the same parameterization (weight vector and bias) and form a feature map. • Max Pooling – A form of non-linear down-sampling. Max pooling partitions the input image into a set of non-overlapping rectangles and, for each such sub-region, outputs the maximum value.
  • 16. Local Receptive Fields • In a fully connected network, every input in the input layer is connected to every hidden unit. • This prevents the network from learning spatial features of the image. • The idea is to map (connect) small rectangular sections of the image (inputs) to different hidden units. • These hidden units are called local receptive fields and result in a sparse connectivity between the input layer and the first hidden layer. • The stride length is the amount by which we shift the rectangular sections. Typically use rectangular sections shifted over 1 pixel • Different sets of local receptive fields form feature maps each of which represent a potentially different feature.
  • 17. Feature Maps • Each hidden unit shares the same set of weights and bias but for a different spatial area of the input. • This allows that layer to learn the same feature but for different regions of the image. • The complete hidden layer will in fact consist of several feature maps. This is called a convolutional layer. • The shared bias and weights in each feature map are often called filters or kernels.
  • 18. How Feature Maps Work The amount by which the local receptive field is shifted is called the stride length. A stride length of 1 is common. All hidden units in a feature map share the same weights and bias. This greatly reduces the number of parameters in a layer. Image credit: Michael Nielsen’s Neural Networks and Deep Learning, Chapter 6.
  • 19. Why Do Feature Maps Learn Different Features? • From Quora: Andy Thomas • Two reasons: – The weights of the filters are randomly initialized – Different feature maps reduce the cost function • Random initialization of the weights will likely ensure each filter converges to different local minima in the cost function. It is very unlikely that each filter would begin to resemble other filters, as that would almost certainly result in an increase of the cost function and therefore no gradient descent algorithm would head in that direction. • Some feature maps may learn the same feature.
  • 20. The Convolution Operator • A Convolution is a simple mathematical operation common to many image processing operators. • Provides a way of “multiplying” two arrays of numbers of different sizes but same dimensionality • Input image has M rows and N columns, and the kernel has m rows and n columns, • The output image will have M - m + 1 rows, and N - n + 1 columns. • The purpose of Convolution in a CNN is to extract features from the input image. • Convolution preserves the spatial relationship between pixels by learning image features using small squares of input data
  • 21. Output of the Convolutional Layer • For each hidden unit in each feature map, only take into account pixels in the local receptive field (sparse connectivity) • For each feature map, for the jth ,kth hidden unit in that feature map, assuming a 5x5 filter (aka kernel), the output of that unit is given by: –σ (b + ∑ l=0,4 ∑ m=0,4 wl,m a j+l,k+m)
  • 22. Pooling • A pooling layer typically follows a convolutional layer. • Intuitively it is a down sampling of the previous layer. • Max pooling is technique that selects the maximum activation from a set of units from the convolutional layer. • Effectively take each feature map from convolutional layer and produce a reduced feature map. • Other pooling techniques: – L2 Pooling • Takes the square root of the sum of the squares of a set of units
  • 23. How Pooling Works • Pooling is a form of statistical aggregation or downsampling of the previous layer. • Pooling layers do not learn anything • While it is common, it is not required to have a pooling after a convolutional layer Image Credit: Michael Nielsen, Neural Networks and Deep Learning, Chapter 6
  • 24. Backpropagation in CNNs Overview • Applying backprogation to a convolutional layer is very similar to applying backprogation to a fully connected except that errors and gradients are computed separately for each filter. • Applying backpropagation to a pooling layer involves using an upsampling function which propagates the error over the sampling function using its derivatives. • Backpropagation for a fully connected layer is exactly the same as for MLPs. • Yoshua Bengio on Quora: “There is a general recipe for obtaining a back-propagation algorithm associated with ANY computational graph. You can find it described in my book, for example, in the feedforward nets (mlp) chapter (6): DEEP LEARNING”
  • 25. Backpropagation in CNNs • Error and gradient for fully connected layers • Error and gradient for convolutional layer • k indexes the filter number and upsample propagates error through pooling layer)
  • 26. Slides from Hiroshi Kuwajima (visiting scholar at Stanford)
  • 27. MNIST Data Set • National Institute for Standards and Technology (NIST) • Modified NIST Data Set maintained by Yan LeCun • MNIST Data in CSV format
  • 28. A Simple Architecture for MNIST Image Credit: Michael Bernstein, Neural Networks and Deep Learning, Chapter 6. • Input layer: 764 inputs encode the MNIST image • Convolutional layer: 1728 units representing 3 feature maps • Max-Pooling layer: 432 units representing 3 feature maps • Output layer: 10 units, one for each digit MNIST dataset
  • 29. Shared Weights and Training CNNs • CNN – 28×28 = 784 input neurons – 20 feature maps 20×26=520 – Total of 520 weights to learn. • MLP – 784=28×28 inputs, – 30 hidden units, – Total of 784×30 weights = 23520 – Total of 30 biases, – Total of 23,550 weights to learn. • A single fully-connected layer would have more than 40 times as many weights as the convolutional layer.
  • 30. A CNN Architecture for MNIST Image Credit: Michael Nielsen, Neural Networks and Deep Learning, Chapter 6 • 9,967 Test images correctly classified out 10,000 • Very similar to LeNet-5 architecture • Softmax Regression aka Multi-class Logistic Regression is a generalization of logistic regression that is used for multi-class classification and based of the softmax function.
  • 31. Incorrectly Classified MNIST Images Of the 10,000 MNIST test images 9,967 correctly classified, 33 incorrectly classified
  • 32. What features are learned? • The images above show the type of features the convolutional learns. • Lighter regions mean a smaller, typically negative weight, • Darker region mean a larger weight • Many of the features have distinguishable sub-regions of light and dark • It’s clear that it’s learning “stuff” related to spatial structure
  • 33. Performance Enhancements • Regularization Terms to help with overfitting – Regularization is technique that allows you to penalize your loss function. • Ensemble methods – Train several nets and have them vote on the output. • Generative expanded data sets – Basically apply distortions to original data set – E.g. 50,000 images  250,000 images
  • 34. Expanded Generated Data Sets Image credit: Tijmen Tieleman, University of Toronto
  • 35. CNN Summary • There are four main operations in a CNN: – Convolution – Non Linearity (ReLU) – Pooling or Sub Sampling – Classification (Fully Connected Layer) • These operations are the basic building blocks of every CNN. • CNN’s Faster to train than MLPs because fewer parameters need to be learned. • Work well with two-dimensional data in which locality is meaningful, – e.g. object recognition in images. • CNN can also be used with higher dimensional data – e.g. MRI Images • Addition convolutional layers provide higher level features (meta features) • Pooling layers progressively reduce the spatial size of the representation to reduce the amount of features and the computational complexity of the network • Fully Connected layer at the end provides the classifier • Rectified Linear Units (ReLU) typically outperform networks based on sigmoid activation functions (sigmoid or tanh).
  • 36. Part 2 Recurrent Neural Nets (RNNs) Recurrent Neural Networks are a family of Neural Networks for procession sequential data.
  • 37. Recurrent Neural Nets Overview • Leverage the ideas – unfolding computational graphs – parameter-sharing to abstract away input position • “In 2009 I visited Nepal” vs “I visited Nepal in 2009” • RNNs represent cyclical graphs so information flows in both directions through the network. – They are networks with loops in them, allowing information to persist. • Different flavors of RNNs – An output at each time-step and recurrent connections between hidden units – An output at each time-step and recurrent connections only from output units – An output only after the entire sequence is fed into the network and connections between hidden units. • RNNs can simulate a Turing Machine and can represent any computable function – Siegelman and Sontag, 1995. – Used an RNN off finite size consisting 886 units
  • 38. RNNs in Practice • Types of RNN used in Practice – Vanilla RNNs – Bidirectional RNNs – Deep Bidirectional RNNs – Long Short-Term Memory (LSTM) • Practical Applications of RNNs – Language Modeling And Generating Text – Machine Translation – Speech Recognition – Generating Image Descriptions
  • 39. Computational Graphs • Computational Graph: Formalization of the structure of a set of computations. • Unfolding a recursive computation into a graph with repetitive structure results in parameter sharing across a deep network structure. • Any function involving a recurrence is an RNN • Hidden Units in RNN: – h(t) = f(h(t-1), x(t), θ) – Notice that θ is the same at each time step.
  • 41. Training RNNs • Backpropagation in Computational Graphs – Backprogation can be derived for any computational graph by recursively applying the chain rule. (Deep Learning, Chapter 6) – The backprogation algorithm consists of performing a Jacobian-gradient-product for each operation in the graph – In vector calculus, the Jacobian matrix is the matrix of all first-order partial derivatives of a vector-valued function • Backpropagation Through Time (BPTT). – Gradient at each output depends not only on the calculations of the current time step, but also the previous time steps. – Vanilla RNNs trained with BPTT have difficulties learning long-term dependencies, i.e. dependencies between (words) steps that are far apart) • “I grew up in France… I speak fluent French” – Suffers from vanishing/exploding gradient problem. • Vanishing gradient: your gradients get smaller and smaller in magnitude as you backpropagate through earlier layers (or through time). • Activation functions like the sigmoid function produce gradients in range [-1,1] which easily causes the gradient to vanish in earlier layers. • Exploding gradient: more of an issue with recurrent networks, where the opposite happens due to a Jacobian with determinant greater than 1. – Certain types of RNNs (like LSTMs) were specifically designed to get around these problems.
  • 42. Long Short Term Memory (LSTM) • LSTMs are a special kind of RNN, capable of learning long-term dependencies. • Successful in handwriting recognition, speech recognition, image captioning and machine translation • Type of gated network • Introduced by Hochreiter & Schmidhuber (1997) – Added self-loops which allowed gradient to flow for long durations. – Weight on the self-loop based on context rather than fixed. (Gers et al., 2000) – Based on the idea of creating paths through the network in which the gradient neither vanishes nor explodes. • Based on the idea of creating paths through the network in which the gradient neither vanishes nor explodes. • Leaky units allowed information to accumulate over a long duration • LSTM’s generalize leaky units by allowing connection weights to change over time. • LSTM’s allow the network to decide when to forget information. • A single hidden unit in an LSTM is replaced with a recurrent network cell consisting of 4 components that interact with each other.
  • 43. Gated Network Cells • Gated network cells replace the hidden units of RNNs • Input feature is computed using the ANN unit. • The input can be accumulated if input gate allows it. • The state has a self-loop controlled by the forget gate • The output can be turned off by the output gate 28×28
  • 44. LSTM in NLP Generation Image credit: Google Research Blog
  • 45. LSTM Summary • A type of RNN architecture that addresses the vanishing/exploding gradient problem. • LSTM allow the learning of long-term dependencies which is crucial for sequences of inputs. • Recently achieved state-of-the-art performance in speech recognition, language modeling, translation, image captioning
  • 46. Additional Topics… • Generalized Adversarial Networks (GANs) • Deep Reinforcement Learning (DRL) • Differentiable Neural Computers (DNCs)
  • 47. Part 3 Generative Adversarial Networks (GANs) Generative Adversarial Networks are an example of generative models. GANs focus primarily on sample generation, though it is possible to design GANs that can estimate the probability distribution.
  • 48. GAN Framework • Based on the idea of a two player game – Player 1: Generator – Player 2: Discriminator • The generator generates samples and tries to fool the discriminator • The discriminator determines if the generated samples are real or fake
  • 49. Why GANs are useful • When predicting the next frame in a video, using the Mean Squared Error (MSE) causes an averaging over many possible futures which causes the ear to disappear and blurring of the eyes • The adversarial version does a much better job preserving the ear and not blurring the eyes. Image credit: Ian Goodfellow, GANs Tutorial, NIPS 2016
  • 50. GANs Summary • GANs are generative models that use supervised learning to approximate an intractable cost function • GANs requires finding Nash equilibria in high dimensional, continuous, non-convex games. • GANs are crucial to many different state of the art image generation and manipulation systems.
  • 51. Part 4 Deep Reinforcement Learning (DRL) Deep Reinforcement Learning combines both Deep Learning and Reinforcement Learning by using Deep Learning techniques to learn values for the Q Function in Reinforcement Learning. This is described in Google Deep Mind’s Atari paper and exemplified by the AlphaGo program
  • 52. Deep Reinforcement Learning • Combines Reinforcement Learning with Deep Learning • A Form of model-free or unsupervised learning • Uses Neural Nets to estimate Q Values. • Very new field. No Wikipedia Page on this topic. • Idea is to 3feed states and actions into the network to predict Q values. • Neural networks are exceptionally good in coming up with good features for highly structured data. • This is the technology used by Google DeepMind’s AlphaGo program.
  • 53. Reinforcement Learning Revisited • Definitions – Policy π is a way of selecting an action given a state – Value function Qπ (s,a) is the expected total reward for performing action a from state s given policy π • Different Approaches – Policy Based RL • Search for the optimal policy in space of policies – Value-based RL • Estimate optimal value function Q*(s,a) – Model-based RL • Build a model of the environment and use look ahead
  • 54. The Many States Problem • In the Nature Deep Mind Atari paper: • Take four last screen images, resize them to 84×84 and convert then to gray scale with 256 gray levels. • This yields 25684×84×4≈1067970 possible game states. • This means 1067970 rows in our imaginary Q-table. • That is more than the number of atoms in the known universe!
  • 56. Deep Q-Learning Error & Gradient • Represent Q function using a deep network. • Error function • Gradient
  • 57. Strategies & Tricks • Experience Relay – During gameplay all the experiences <s,a,r,s′> are stored in a replay memory. – When training the network, random samples from the replay memory are used instead of the most recent transition. – This breaks the similarity of subsequent training samples, which otherwise might drive the network into a local minimum. – Also experience replay makes the training task more similar to usual supervised learning, which simplifies debugging and testing the algorithm. – One could actually collect all those experiences from human gameplay and the train network on these. • Exploration-Exploitation – ε-greedy exploration – with probability ε choose a random action, otherwise go with the “greedy” action with the highest Q-value.
  • 60. References (1) • Neural Nets & Deep Learning – http://neuralnetworksanddeeplearning.com/chap2.html – http://deeplearning.net/tutorial/deeplearning.pdf • Convolutional Neural Networks – http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf – http://neuralnetworksanddeeplearning.com/chap6.html – http://cs231n.github.io/convolutional-networks/ – Visualizing and Understanding Convolutional Networks – Convolutional Neural Networks backpropagation: from intuition to derivation – An Intuitive Explanation of Convolutional Neural Networks – Backpropagation in Convolutional Neural Networks • Recurrent Neural Nets – http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf – http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/ – http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-2-implementing-a-language- model-rnn-with-python-numpy-and-theano/ – http://www.wildml.com/2015/10/recurrent-neural-networks-tutorial-part-3-backpropagation-through- time-and-vanishing-gradients/ – http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-4-implementing-a-grulstm-rnn- with-python-and-theano/
  • 61. References (2) • Generative Adversarial Networks – NIPS 2016 Tutorial: Generative Adversarial Networks • Deep Reinforcement Learning – http://www0.cs.ucl.ac.uk/staff/d.silver/web/Resources_files/deep_rl.pdf – http://neuro.cs.ut.ee/demystifying-deep-reinforcement-learning/ • Differentiable Neural Computers – https://deepmind.com/blog/differentiable-neural-computers/ • Google DeepMind DRL Atari Paper – https://storage.googleapis.com/deepmind-media/alphago/AlphaGoNaturePaper.pdf
  • 62. Questions • Goodfellow quote on BP on Quora • Vanishing / exploding gradient