1. Deep Learning
for Computer Vision
Executive-ML 2017/09/21
Neither Proprietary nor Confidential – Please Distribute ;)
Alex Conway
alex @ numberboost.com
@alxcnwy
3. Check out the
Deep Learning Indaba
videos & practicals!
http://www.deeplearningindaba.com/videos.html
http://www.deeplearningindaba.com/practicals.html
18. 1. What is a neural network?
2. What is a convolutionalneural network?
3. How to use a convolutionalneural network
4. More advanced Methods
5. Case studies & applications
18
19. Big Shout Outs
Jeremy Howard & Rachel Thomas
http://course.fast.ai
Andrej Karpathy
http://cs231n.github.io
François Chollet (Keras lead dev)
https://keras.io/
19
21. What is a neuron?
21
• 3 inputs[x1,x2,x3]
• 3 weights[w1,w2,w3]
• Element-wise multiply and sum
• Apply activationfunction f
• Often add a bias too (weightof 1) – not shown
22. What is an Activation Function?
22
Sigmoid Tanh ReLU
Nonlinearities … “squashing functions” … transformneuron’s output
NB: sigmoid output in[0,1]
23. What is a (Deep) Neural Network?
23
Inputs outputs
hidden
layer 1
hidden
layer 2
hidden
layer 3
Outputs of one layer are inputs into the next layer
24. How does a neural network learn?
24
• You need labelled examples “trainingdata”
• Initially, the network makes random predictions (weights initializedrandomly)
• For each training data point, we calculate the error between the network’s
predictions and the ground-truth labels (aka “loss function”)
• Use ‘backpropagation’ (really just the chain rule), to update the network
parameters (weights) in the opposite direction to the error
25. How does a neural network learn?
25
New
weight = Old
weight
Learning
rate-
Gradient of
weight with
respect to Error( )x
“How much
error increases
when we increase
this weight”
28. What is a Neural Network?
For much more detail, see:
1. Michael Nielson’s Neural Networks & Deep
Learning free online book
http://neuralnetworksanddeeplearning.com/chap1.html
2. Anrej Karpathy’s CS231n Notes
http://neuralnetworksanddeeplearning.com/chap1.html
28
30. What is a Convolutional Neural Network?
30
“like a ordinary neural network but with special
types of layers that work well on images”
(math works on numbers)
• Pixel = 3 colour channels (R, G, B)
• Pixel intensity ∈[0,255]
• Image has width w and height h
• Therefore image is w x h x 3 numbers
31. 31
This is VGGNet – don’t panic, we’ll break it down piece by piece
Example Architecture
32. 32
This is VGGNet – don’t panic, we’ll break it down piece by piece
Example Architecture
35. New Layer Type: ConvolutionalLayer
35
• 2-d weighted average when multiply kernel over pixel patches
• We slide the kernel over all pixels of the image (handle borders)
• Kernel starts off with “random” values and network updates (learns)
the kernel values (using backpropagation) to try minimize loss
• Kernels shared across the whole image (parameter sharing)
36. Many Kernels = Many “Activation Maps” = Volume
36http://cs231n.github.io/convolutional-networks/
45. New Layer Type: Max Pooling
• Reduces dimensionality from one layer to next
• …by replacing NxN sub-area with max value
• Makes network “look” at larger areas of the image at a time
• e.g. Instead of identifying fur, identify cat
• Reduces overfittingsince losing information helps the network generalize
45
60. Using a Pre-Trained ImageNet-Winning CNN
60
• We’ve been looking at “VGGNet”
• Oxford Visual Geometry Group (VGG)
• ImageNet 2014 Runner-up
• Network is 16 layers (deep!)
• Easy to fine-tune
https://blog.keras.io/building-powerful-image-classification-models-using-
very-little-data.html
64. Fine-tuning A CNN To Solve A New Problem
• Cut off last layer of pre-trained Imagenet winning CNN
• Keep learned network (convolutions) but replace final layer
• Can learn to predict new (completely different) classes
• Fine-tuning is re-training new final layer - learn for new task
64
68. Fine-tuning A CNN To Solve A New Problem
• Fix weights in convolutional layers (set trainable=False)
• Remove final dense layer that predicts 1000 ImageNet classes
• Replace with new dense layer to predict 9 categories
68
88% accuracy in under 2 minutes for
classifying products into categories
Fine-tuning is awesome!
Insert obligatory brain analogy
69. Visual Similarity
69
• Chop off last 2 VGG layers
• Use dense layer with 4096 activations
• Compute nearest neighbours in the space of these activations
https://memeburn.com/2017/06/spree-image-search/
83. Image & Video Moderation
TODO
83
Large internationalgay datingapp with tens of millions of users
uploadinghundreds-of-thousandsofphotosperday
84. Estimating Accident Repair Cost from Photos
TODO
84
Prototype for
large SA insurer
Detect car make
& model from
registrationdisk
Predict repair
cost using
learnedmodel