6. WHAT IS THIS TALK ABOUT?
Using Neural Networks
and Deep Learning
To recognize images
By the end of the class
you will be able to
create your own deep
learning systems
13. HISTORY OF MACHINE LEARNING
Input Features Algorithm Output
Machine Human Human Machine
Machine Human Machine Machine
Machine Machine Machine Machine
15. DEEP LEARNING MILESTONES
Years Theme
1980s Backpropagation invented allows multi-layer
Neural Networks
2000s SVMs, Random Forests and other classifiers
overtook NNs
2010s Deep Learning reignited interest in NN
16. IMAGENET
AlexNet submitted to the ImageNet ILSVRC challenge in
2012 is partly responsible for the renaissance.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton used
Deep Learning techniques.
They combined this with GPUs, some other techniques.
The result was a neural network that could classify images
of cats and dogs.
It had an error 16% compared to 26% for the runner up.
20. MACHINE LEARNING AND DEEP
LEARNING
Deep Learning fits inside
Machine Learning
Deep Learning a
Machine Learning
technique
Share techniques for
evaluating and
optimizing models
21. WHAT IS MACHINE LEARNING?
Inputs: Vectors or points of high dimensions
Outputs: Either binary vectors or continuous vectors
Machine Learning finds the relationship between them
Uses statistical techniques
25. CLASSIFICATION EXAMPLE:
EMAIL SPAM DETECTION
Start with large collection of emails, labeled spam/not-
spam
Convert email text into vectors of 0s and 1s: 0 if a word
occurs, 1 if it does not
These are called inputs or features
Split data set into training set (70%) and test set (30%)
Use algorithm like Random Forest to build model
Evaluate model by running it on test set and capturing
success rate
27. CHOOSING ALGORITHM
Evaluate different models on data
Look at the relative success rates
Use rules of thumb: some algorithms work better on some
kinds of data
28. CLASSIFICATION EXAMPLES
Is this tumor benign or cancerous?
Is this lead profitable or not?
Who will win the presidential elections?
29. CLASSIFICATION: POP QUIZ
Is classification supervised or unsupervised learning?
Supervised because you have to label the data.
30. CLUSTERING EXAMPLE: LOCATE
CELL PHONE TOWERS
Start with GPS
coordinates of all cell
phone users
Represent data as
vectors
Locate towers in biggest
clusters
31. CLUSTERING EXAMPLE: T-SHIRTS
What size should a t-
shirt be?
Everyone’s real t-shirt
size is different
Lay out all sizes and
cluster
Target large clusters
with XS, S, M, L, XL
32. CLUSTERING: POP QUIZ
Is clustering supervised or unsupervised?
Unsupervised because no labeling is required
36. REGRESSION EXAMPLES
How many units of product will sell next month
What will student score on SAT
What is the market price of this house
How long before this engine needs repair
37. REGRESSION EXAMPLE:
AIRCRAFT PART FAILURE
Cessna collects data
from airplane sensors
Predict when part needs
to be replaced
Ship part to customer’s
service airport
39. ANOMALY DETECTION EXAMPLE:
CREDIT CARD FRAUD
Train model on good
transactions
Anomalous activity
indicates fraud
Can pass transaction
down to human for
investigation
40. ANOMALY DETECTION EXAMPLE:
NETWORK INTRUSION
Train model on network
login activity
Anomalous activity
indicates threat
Can initiate alerts and
lockdown procedures
41. ANOMALY DETECTION: QUIZ
Is anomaly detection supervised or unsupervised?
Unsupervised because we only train on normal data
46. HISTORY OF MACHINE LEARNING
Input Features Algorithm Output
Machine Human Human Machine
Machine Human Machine Machine
Machine Machine Machine Machine
48. DEEP LEARNING FRAMEWORKS
TensorFlow: NN library from Google
Theano: Low-level GPU-enabled tensor library
Torch7: NN library, uses Lua for binding, used by Facebook
and Google
Caffe: NN library by Berkeley AMPLab
Nervana: Fast GPU-based machines optimized for deep
learning
49. DEEP LEARNING FRAMEWORKS
Keras, Lasagne, Blocks: NN libraries that make Theano
easier to use
CUDA: Programming model for using GPUs in general-
purpose programming
cuDNN: NN library by Nvidia based on CUDA, can be used
with Torch7, Caffe
Chainer: NN library that uses CUDA
51. TENSORFLOW
TensorFlow originally
developed by Google
Brain Team
Allows using GPUs for
deep learning
algorithms
Single processor version
released in 2015
Multiple processor
version released in
March 2016
52. KERAS
Supports Theano and
TensorFlow as back-
ends
Provides deep learning
API on top of TensorFlow
TensorFlow provides
low-level matrix
operations
58. MATHEMATICAL FUNCTION
Neuron is a mathematical function
Adds up (weighted) inputs and applies sigmoid (or other
function)
This determines if it fires or not
59. WHAT ARE NEURAL NETWORKS?
Biologically inspired machine learning algorithm
Mathematical neurons arranged in layers
Accumulate signals from the previous layer
Fire when signal reaches threshold
61. NEURON INCOMING
Each neuron receives
signals from neurons in
previous layer
Signal affected by
weight
Some are more
important than others
Bias is the base signal
that the neuron receives
69. NEURON LAYERS
The nomination is the
last layer, layer N
States are layer N-1
Counties are layer N-2
Districts are layer N-3
Individuals are layer N-4
Individual brains have
even more layers
71. TRAINING: HOW DO WE
IMPROVE?
Calculate error from desired goal
Increase weight of neurons who voted right
Decrease weight of neurons who voted wrong
This will reduce error
73. FEED FORWARD
Also called forward
propagation or forward
prop
Initialize inputs
Calculate activation of
each layer
Calculate activation of
output layer
74. BACK PROPAGATION
Use forward prop to
calculate the error
Error is function of all
network weights
Adjust weights using
gradient descent
Repeat with next record
Keep going over training
set until convergence
75. HOW DO YOU FIND THE MINIMUM
IN AN N-DIMENSIONAL SPACE?
Take a step in the steepest direction.
Steepest direction is vector sum of all derivatives.
76.
77. PUTTING ALL THIS TOGETHER
Use forward prop to
activate
Use back prop to train
Then use forward prop
to test
82. BENEFITS OF RELU
Popular
Accelerates convergence
by 6x (Krizhevsky et al)
Operation is faster since
it is linear not
exponential
Can die by going to zero
Pro: Sparse matrix
Con: Network can die
86. PROBLEM: OIL EXPLORATION
Drilling holes is
expensive
We want to find the
biggest oilfield without
wasting money on duds
Where should we plant
our next oilfield derrick?
88. HYPERPARAMETER EXAMPLE
How many layers should
we have
How many neurons
should we have in
hidden layers
Should we use Sigmoid,
Tanh, or ReLU
Should we initialize
91. RANDOM
Randomly search the grid
Remember the best found so far
Bergstra and Bengio’s result and Alice Zheng’s
explanation (see References)
60 random samples gets you within top 5% of grid search
with 95% probability
97. DEPLOYING
Phases: training,
deployment
Training phase run on
back-end servers
Optimize hyper-
parameters on back-end
Deploy model to front-
end servers, browsers,
devices
Front-end only uses
forward prop and is fast
99. HDF 5
Keras serializes model architecture to JSON
Keras serializes weights to HDF5
Serialization model for hierarchical data
APIs for C++, Python, Java, etc
https://www.hdfgroup.org
100. DEPLOYMENT EXAMPLE: CANCER
DETECTION
Rhobota.com’s cancer
detecting iPhone app
Developed by Bryan
Shaw a!er his son’s
illness
Model built on back-end,
deployed on iPhone
iPhone detects retinal
cancer
102. WHAT IS DEEP LEARNING?
Deep Learning is a learning method that can train the
system with more than 2 or 3 non-linear hidden layers.
103. WHAT IS DEEP LEARNING?
Machine learning techniques which enable unsupervised
feature learning and pattern analysis/classification.
The essence of deep learning is to compute
representations of the data.
Higher-level features are defined from lower-level ones.
104. HOW IS DEEP LEARNING
DIFFERENT FROM REGULAR
NEURAL NETWORKS?
Training neural networks requires applying gradient
descent on millions of dimensions.
This is intractable for large networks.
Deep learning places constraints on neural networks.
This allows them to be solvable iteratively.
The constraints are generic.
106. WHAT ARE AUTO-ENCODERS?
An auto-encoder is a learning algorithm
It applies backpropagation and sets the target values to
be equal to its inputs
In other words it trains itself to do the identity
transformation
107.
108. WHY DOES IT DO THIS?
Auto-encoder places constraints on itself
E.g. it restricts the number of hidden neurons
This allows it to find a good representation of the data
112. CNNS
The convolutional layer’s parameters are a set of
learnable filters
Every filter is small along width and height
During the forward pass, each filter slides across the width
and height of the input, producing a 2-dimensional
activation map
As we slide across the input we compute the dot product
between the filter and the input
113. CNNS
Intuitively, the network learns filters that activate when
they see a specific type of feature anywhere
In this way it creates translation invariance
114. CONVNET EXAMPLE
Zero-Padding: the boundaries are padded with a 0
Stride: how much the filter moves in the convolution
Parameter sharing: all filters share the same parameters
117. WHAT IS A POOLING LAYER?
The pooling layer reduces the resolution of the image
further
It tiles the output area with 2x2 mask and takes the
maximum activation value of the area
121. RNNS
RNNs capture patterns
in time series data
Constrained by shared
weights across neurons
Each neuron observes
different times
122. LSTMS
Long Short Term Memory networks
RNNs cannot handle long time lags between events
LSTMs can pick up patterns separated by big lags
Used for speech recognition
123. RNN EFFECTIVENESS
Andrej Karpathy uses
LSTMs to generate text
Generates Shakespeare,
Linux Kernel code,
mathematical proofs.
See
http://karpathy.github.io/
127. REFERENCES
Bayesian Optimization by Dewancker et al
Random Search by Bengio et al
Evaluating machine learning models
Alice Zheng
http://sigopt.com
http://jmlr.org
http://www.oreilly.com
128. REFERENCES
Dropout by Hinton et al
Understanding LSTM Networks by Chris Olah
Multi-scale Deep Learning for Gesture Detection and
Localization
by Neverova et al
Unreasonable Effectiveness of RNNs by Karpathy
http://cs.utoronto.edu
http://github.io
http://uoguelph.ca
http://karpathy.github.io