SlideShare une entreprise Scribd logo
1  sur  55
Capsule Networks
Aurélien Géron, November 2017
https://youtu.be/pPN8d0E3900
Aurélien Géron, 2017
NIPS 2017 Paper
Dynamic Routing Between Capsules
by Sara Sabour, Nicholas Frosst, Geoffrey E. Hinton
October 2017: https://arxiv.org/abs/1710.09829
Aurélien Géron, 2017
Computer Graphics
Rectangle
x=20
y=30
angle=16°
Triangle
x=24
y=25
angle=-65°
Instantiation parameters ImageRendering
Aurélien Géron, 2017
Inverse Graphics
Instantiation parameters ImageInverse rendering
Rectangle
x=20
y=30
angle=16°
Triangle
x=24
y=25
angle=-65°
Aurélien Géron, 2017
Capsules
Capsule activations ImageInverse rendering
=
=
Aurélien Géron, 2017
Activation vector:
Capsules
Length = estimated probability of presence
Orientation = object’s estimated pose parameters
=
=
Aurélien Géron, 2017
Squash(u) =
Capsules
=
= Convolutional Layers
+ Reshape
+ Squash
||u||2
1 + ||u||2
u
||u||
Aurélien Géron, 2017
Equivariance
=
=
Aurélien Géron, 2017
Equivariance
=
=
Aurélien Géron, 2017
A hierarchy of parts
Boat
x=22
y=28
angle=16°
Aurélien Géron, 2017
A hierarchy of parts
Rectangle
x=20
y=30
angle=16°
Triangle
x=24
y=25
angle=-65°
Boat
x=22
y=28
angle=16°
Aurélien Géron, 2017
A hierarchy of parts
Rectangle
x=20
y=30
angle=-5°
Triangle
x=26
y=31
angle=137°
House
x=22
y=28
angle=-5°
Aurélien Géron, 2017
Primary Capsules
=
=
Primary Capsules
Aurélien Géron, 2017
Predict Next Layer’s
Output
=
=
Primary Capsules
Aurélien Géron, 2017
Predict Next Layer’s
Output
=
=
Primary Capsules
Aurélien Géron, 2017
Predict Next Layer’s
Output
=
=
One transformation matrix Wi,j
per part/whole pair (i, j).
ûj|i = Wi,j ui
Primary Capsules
Aurélien Géron, 2017
Predict Next Layer’s
Output
=
=
Primary Capsules
Aurélien Géron, 2017
Predict Next Layer’s
Output
=
=
Primary Capsules
Aurélien Géron, 2017
Compute Next
Layer’s Output
=
=
Predicted Outputs
Primary Capsules
Aurélien Géron, 2017
Routing by
Agreement
=
=
Predicted Outputs
Primary Capsules
Strong agreement!
Aurélien Géron, 2017
The rectangle and triangle
capsules should be routed to
the boat capsules.
Routing by
Agreement
=
=
Predicted Outputs
Primary Capsules
Strong agreement!
Aurélien Géron, 2017
Clusters of
Agreement
Aurélien Géron, 2017
Clusters of
Agreement
Mean
Aurélien Géron, 2017
Clusters of
Agreement
Mean
Aurélien Géron, 2017
Clusters of
Agreement
Mean
Aurélien Géron, 2017
Clusters of
Agreement
Mean
Aurélien Géron, 2017
Clusters of
Agreement
Mean
Aurélien Géron, 2017
Routing Weights
=
=
Predicted Outputs
Primary Capsules
bi,j=0 for all i, j
Aurélien Géron, 2017
Routing Weights
=
=
Predicted Outputs
Primary Capsules
0.5
0.5
0.5
0.5
bi,j=0 for all i, j
ci = softmax(bi)
Aurélien Géron, 2017
Compute Next
Layer’s Output
=
=
Predicted Outputs
sj = weighted sum
Primary Capsules
0.5
0.5
0.5
0.5
Aurélien Géron, 2017
Compute Next
Layer’s Output
=
=
Predicted Outputs
Primary Capsules
0.5
0.5
0.5
0.5
sj = weighted sum
vj = squash(sj)
Aurélien Géron, 2017
Actual outputs
of the next layer capsules
(round #1)
Compute Next
Layer’s Output
=
=
Predicted Outputs
Primary Capsules
0.5
0.5
0.5
0.5
sj = weighted sum
vj = squash(sj)
Aurélien Géron, 2017
Actual outputs
of the next layer capsules
(round #1)
Update Routing
Weights
=
=
Predicted Outputs
Primary Capsules
Agreement
Aurélien Géron, 2017
Actual outputs
of the next layer capsules
(round #1)
Update Routing
Weights
=
=
Predicted Outputs
Primary Capsules
Agreement bi,j += ûj|i . vj
Aurélien Géron, 2017
Actual outputs
of the next layer capsules
(round #1)
Update Routing
Weights
=
=
Predicted Outputs
Primary Capsules
Agreement bi,j += ûj|i . vj
Large
Aurélien Géron, 2017
Actual outputs
of the next layer capsules
(round #1)
Update Routing
Weights
=
=
Predicted Outputs
Primary Capsules
Disagreement bi,j += ûj|i . vj
Small
Aurélien Géron, 2017
Compute Next
Layer’s Output
=
=
Predicted Outputs
Primary Capsules
0.2
0.1
0.8
0.9
Aurélien Géron, 2017
Compute Next
Layer’s Output
=
=
Predicted Outputs
sj = weighted sum
Primary Capsules
0.2
0.1
0.8
0.9
Aurélien Géron, 2017
Compute Next
Layer’s Output
=
=
Predicted Outputs
Primary Capsules
sj = weighted sum
vj = squash(sj)0.2
0.1
0.8
0.9
Aurélien Géron, 2017
Actual outputs
of the next layer capsules
(round #2)
Compute Next
Layer’s Output
=
=
Predicted Outputs
Primary Capsules
0.2
0.1
0.8
0.9
Aurélien Géron, 2017
Handling Crowded
Scenes
=
=
=
=
Aurélien Géron, 2017
Handling Crowded
Scenes
=
=
=
=
Is this an upside
down house?
Aurélien Géron, 2017
Handling Crowded
Scenes
=
=
=
=
House
Thanks to routing by agreement,
the ambiguity is quickly resolved
(explaining away).
Boat
Aurélien Géron, 2017
Classification
CapsNet
|| ℓ2 || Estimated Class Probability
Aurélien Géron, 2017
Training
|| ℓ2 || Estimated Class Probability
To allow multiple classes,
minimize margin loss:
Lk = Tk max(0, m+ - ||vk||2)
+ λ (1 - Tk) max(0, ||vk||2 - m-)
Tk = 1 iff class k is present
In the paper:
m- = 0.1
m+ = 0.9
λ = 0.5
Aurélien Géron, 2017
Training
Translated to English:
“If an object of class k
is present, then ||vk||2
should be no less than
0.9. If not, then ||vk||2
should be no more
than 0.1.”
|| ℓ2 || Estimated Class Probability
To allow multiple classes,
minimize margin loss:
Lk = Tk max(0, m+ - ||vk||2)
+ λ (1 - Tk) max(0, ||vk||2 - m-)
Tk = 1 iff class k is present
In the paper:
m- = 0.1
m+ = 0.9
λ = 0.5
Aurélien Géron, 2017
Regularization by
Reconstruction
|| ℓ2 ||
Feedforward
Neural Network
Decoder
Reconstruction
Aurélien Géron, 2017
Regularization by
Reconstruction
|| ℓ2 ||
Feedforward
Neural Network
Decoder
Reconstruction
Loss = margin loss + α reconstruction loss
The reconstruction loss is the squared difference
between the reconstructed image and the input image.
In the paper, α = 0.0005.
Aurélien Géron, 2017
A CapsNet for
MNIST
(Figure 1 from the paper)
Aurélien Géron, 2017
A CapsNet for
MNIST – Decoder
(Figure 2 from the paper)
Aurélien Géron, 2017
Interpretable
Activation Vectors
(Figure 4 from the paper)
Aurélien Géron, 2017
Pros
● Reaches high accuracy on MNIST, and promising on CIFAR10
● Requires less training data
● Position and pose information are preserved (equivariance)
● This is promising for image segmentation and object detection
● Routing by agreement is great for overlapping objects (explaining away)
● Capsule activations nicely map the hierarchy of parts
● Offers robustness to affine transformations
● Activation vectors are easier to interpret (rotation, thickness, skew…)
● It’s Hinton! ;-)
Aurélien Géron, 2017
● Not state of the art on CIFAR10 (but it’s a good start)
● Not tested yet on larger images (e.g., ImageNet): will it work well?
● Slow training, due to the inner loop (in the routing by agreement algorithm)
● A CapsNet cannot see two very close identical objects
○ This is called “crowding”, and it has been observed as well in human vision
Cons
Aurélien Géron, 2017
Implementations
● Keras w/ TensorFlow backend: https://github.com/XifengGuo/CapsNet-
Keras
● TensorFlow: https://github.com/naturomics/CapsNet-Tensorflow
● PyTorch: https://github.com/gram-ai/capsule-networks
Amazon: https://goo.gl/IoWYKD
Twitter: @aureliengeron
github.com/ageron

Contenu connexe

Tendances

Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkRichard Kuo
 
ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)Sanjay Saha
 
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingEfficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingJinwon Lee
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksAshray Bhandare
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 
Introduction to Grad-CAM (complete version)
Introduction to Grad-CAM (complete version)Introduction to Grad-CAM (complete version)
Introduction to Grad-CAM (complete version)Hsing-chuan Hsieh
 
ShuffleNet - PR054
ShuffleNet - PR054ShuffleNet - PR054
ShuffleNet - PR054Jinwon Lee
 
Matrix capsules with em routing
Matrix capsules with em routingMatrix capsules with em routing
Matrix capsules with em routingJoowon Moon
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models Chia-Wen Cheng
 
Pixel Recurrent Neural Networks
Pixel Recurrent Neural NetworksPixel Recurrent Neural Networks
Pixel Recurrent Neural Networksneouyghur
 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNNNoura Hussein
 
CNN Attention Networks
CNN Attention NetworksCNN Attention Networks
CNN Attention NetworksTaeoh Kim
 
SSII2018TS: 大規模深層学習
SSII2018TS: 大規模深層学習SSII2018TS: 大規模深層学習
SSII2018TS: 大規模深層学習SSII
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function범준 김
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural NetworksYogendra Tamang
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOswald Campesato
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative ModelsMLReview
 
Convolutional Neural Network (CNN) - image recognition
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognitionYUNG-KUEI CHEN
 

Tendances (20)

Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingEfficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter Sharing
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Introduction to Grad-CAM (complete version)
Introduction to Grad-CAM (complete version)Introduction to Grad-CAM (complete version)
Introduction to Grad-CAM (complete version)
 
The Origin of Grad-CAM
The Origin of Grad-CAMThe Origin of Grad-CAM
The Origin of Grad-CAM
 
ShuffleNet - PR054
ShuffleNet - PR054ShuffleNet - PR054
ShuffleNet - PR054
 
Matrix capsules with em routing
Matrix capsules with em routingMatrix capsules with em routing
Matrix capsules with em routing
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
Pixel Recurrent Neural Networks
Pixel Recurrent Neural NetworksPixel Recurrent Neural Networks
Pixel Recurrent Neural Networks
 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNN
 
CNN Attention Networks
CNN Attention NetworksCNN Attention Networks
CNN Attention Networks
 
SSII2018TS: 大規模深層学習
SSII2018TS: 大規模深層学習SSII2018TS: 大規模深層学習
SSII2018TS: 大規模深層学習
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
 
Convolutional Neural Network (CNN) - image recognition
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognition
 

Dernier

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 

Dernier (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 

Introduction to Capsule Networks (CapsNets)

Notes de l'éditeur

  1. This presentation will tell you all about Capsule Networks, a hot new architecture for neural nets. Geoffrey Hinton had the idea of Capsule Networks several years ago, and he published a paper in 2011 that introduced many of the key ideas, but he had a hard time making them work properly, until now.
  2. A few weeks ago, in October 2017, a paper called “Dynamic Routing Between Capsules” was published by Sara Sabour, Nicholas Frosst and of course Geoffrey Hinton. They managed to reach state of the art performance on the MNIST dataset, and demonstrated considerably better results than convolutional neural nets on highly overlapping digits. So what are capsule networks exactly?
  3. Well, in computer graphics, you start with an abstract representation of a scene, for example a rectangle at position x=20 and y=30, rotated by 16°, and so on. Each object type has various instantiation parameters. Then you call some rendering function, and boom, you get an image.
  4. Inverse graphics, is just the reverse process. You start with an image, and you try to find what objects it contains, and what their instantiation parameters are. A capsule network is basically a neural network that tries to perform inverse graphics.
  5. It is composed of many capsules. A capsule is any function that tries to predict the presence and the instantiation parameters of a particular object at a given location. For example, the network above contains 50 capsules. The arrows represent the output vectors of these capsules. The capsules output vectors. The black arrows correspond to capsules that try to find rectangles, while the blue arrows represent the output of capsules looking for triangles. The length of an activation vector represents the estimated probability that the object the capsule is looking for is indeed present. You can see that most arrows are tiny, meaning the capsules didn’t detect anything, but two arrows are quite long. This means that the capsules at these locations are pretty confident that they found what they were looking for, in this case a rectangle, and a triangle.
  6. Next, the orientation of the activation vector encodes the instantiation parameters of the object, for example in this case the object’s rotation, but it could be also its thickness, how stretched or skewed it is, its exact position (there might be slight translations), and so on. For simplicity, I’ll just focus on the rotation parameter, but in a real capsule network, the activation vectors may have 5, 10 dimensions or more.
  7. In practice, a good way to implement this is to first apply a couple convolutional layers, just like in a regular convolutional neural net. This will output an array containing a bunch of feature maps. You can then reshape this array to get a set of vectors for each location. For example, suppose the convolutional layers output an array containing, say, 18 feature maps (2 times 9), you can easily reshape this array to get 2 vectors of 9 dimensions each, for every location. You could also get 3 vectors of 6 dimensions each, and so on. Something that would look like the capsule network represented here with two vectors at each location. The last step is to ensure that no vector is longer than 1, since the vector’s length is meant to represent a probability, it cannot be greater than 1. To do this, we apply a squashing function. It preserves the vector’s orientation, but it squashes it to ensure that its length is between 0 and 1.
  8. One key feature of Capsule Networks is that they preserve detailed information about the object’s location and its pose, throughout the network. For example, if I rotate the image slightly...
  9. ...notice that the activation vectors also change slightly. Right? This is called equivariance. In a regular convolutional neural net, there are generally several pooling layers, and unfortunately these pooling layers tend to lose information, such as the precise location and pose of the objects. It’s really not a big deal if you just want to classify the whole image, but it makes it challenging to perform accurate image segmentation or object detection (which require precise location and pose). The fact that capsules are equivariant makes them very promising for these applications.
  10. All right, so now let’s see how capsule networks can handle objects that are composed of a hierarchy of parts. For example, consider a boat centered at position x=22 and y=28, and rotated by 16°. This boat is composed of parts. In this case one rectangle and one triangle.
  11. So this is how it would be rendered. Now we want to do the reverse, we want inverse graphics, so we want to go from the image to this whole hierarchy of parts with their instantiation parameters.
  12. Similarly, we could also draw a house, using the same parts, a rectangle and a triangle, but this time organized in a different way. So the trick will be to try to go from this image containing a rectangle and a triangle, and figure out, not only that the rectangle and triangle are at this location and this orientation, but also that they are part of a boat, not a house. So let’s figure out how it would do this.
  13. The first step we have already seen: we run a couple convolutional layers, we reshape the output to get vectors, and we squash them. This gives us the output of the primary capsules. We’ve got the first layer already. The next step is where most of the magic and complexity of capsule networks takes place. Every capsule in the first layer tries to predict the output of every capsule in the next layer.
  14. For example, let’s consider the capsule that detected the rectangle. I’ll call it the rectangle-capsule.
  15. Let’s suppose that there are just two capsules in the next layer, the house-capsule and the boat-capsule. Since the rectangle-capsule detected a rectangle rotated by 16°, it predicts that the house-capsule will detect a house rotated by 16°, that makes sense, and the boat-capsule will detect a boat rotated by 16° as well. That’s what would be consistent with the orientation of the rectangle.
  16. So, to make this prediction, what the rectangle-capsule does is it simply computes the dot product of a transformation matrix W_i,j with its own activation vector u_i. During training, the network will gradually learn a transformation matrix for each pair of capsules in the first and second layer. In other words, it will learn all the part-whole relationships, for example the angle between the wall and the roof of a house, and so on.
  17. Now let’s see what the triangle-capsule predicts.
  18. This time, it’s a bit more interesting: given the rotation angle of the triangle, it predicts that the house-capsule will detect an upside-down house, and that the boat-capsule will detect a boat rotated by 16°. These are the positions that would be consistent with the rotation angle of the triangle.
  19. Now we have a bunch of predicted outputs, what do we do with them?
  20. As you can see, the rectangle-capsule and the triangle-capsule strongly agree on what the boat-capsule will output. In other words, they agree that a boat positioned in this way would explain their own positions and rotations. And they totally disagree on what the house-capsule will output. Therefore, it makes sense to assume that the rectangle and triangle are part of a boat, not a house.
  21. Now that we know that the rectangle and triangle are part of a boat, the outputs of the rectangle capsule and the triangle capsule really concern only the boat capsule, there’s no need to send these outputs to any other capsule, this would just add noise. They should be sent only to the boat capsule. This is called routing by agreement. There are several benefits: first, since capsule outputs are only routed to the appropriate capsule in the next layer, these capsules will get a cleaner input signal and will more accurately determine the pose of the object. Second, by looking at the paths of the activations, you can easily navigate the hierarchy of parts, and know exactly which part belongs to which object (like, the rectangle belongs to the boat, or the triangle belongs to the boat, and so on). Lastly, routing by agreement helps parse crowded scenes with overlapping objects (we will see this in a few slides). But first, let’s look at how routing by agreement is implemented in Capsule Networks.
  22. Here, I have represented the various poses of the boat, as predicted by the lower-level capsules. For example, one of these circles may represent what the rectangle-capsule thinks about the most likely pose of the boat, and another circle may represent what the triangle-capsule thinks, and if we suppose that there are many other low-level capsules, then we might get a cloud of prediction vectors, for the boat capsule, like this. In this example, there are two pose parameters: one represents the rotation angle, and the other represents the size of the boat. As I mentioned earlier, pose parameters may capture many different kinds of visual features, like skew, thickness, and so on. Or precise location. So the first thing we do, is we compute the mean of all these predictions.
  23. This gives us this vector. The next step is to measure the distance between each predicted vector and the mean vector. I will use here the euclidian distance here, but capsule networks actually use the scalar product. Basically, we want to measure how much each predicted vector agrees with the mean predicted vector. Using this agreement measure, we can update the weight of every predicted vector accordingly.
  24. Note that the predicted vectors that are far from the mean now have a very small weight, and the ones closest to the mean have a much stronger weight. I’ve represented them in black. Now we can just compute the mean once again (or I should say, the weighted mean).
  25. and you’ll notice that it moves slightly towards the cluster, towards the center of the cluster. So next, we can once again update the weights.
  26. And now most of the vectors within the cluster have turned black. And again, we can update the mean.
  27. And we can repeat this process a few times. In practice 3 to 5 iterations are generally sufficient. This might remind you, I suppose, of the k-means clustering algorithm if you know it. Okay, so this is how we find clusters of agreement. Now let’s see how the whole algorithm works in a bit more details.
  28. First, for every predicted output, we start by setting a raw routing weight b_i,j equal to 0.
  29. Next, we apply the softmax function to these raw weights, for each primary capsule. This gives the actual routing weights for each predicted output, in this example 0.5 each.
  30. Next we compute a weighted sum of the predictions, for each capsule in the next layer.
  31. This might give vectors longer than 1, so as usual we apply the squash function.
  32. And voilà! We now have the actual outputs of the house-capsule and boat-capsule. But this is not the final output, it’s just the end of the first round, the first iteration.
  33. Now we can see which predictions were most accurate. For example, the rectangle-capsule made a great prediction for the boat-capsule’s output. It really matches it pretty closely.
  34. This is estimated by computing the scalar product of the predicted output vector û_j|i and the actual product vector v_j. This scalar product is simply added to the predicted output’s raw routing weight, b_i,j. So the weight of this particular predicted output is increased.
  35. When there is a strong agreement, this scalar product is large, so good predictions will have a higher weight.
  36. On the other hand, the rectangle-capsule made a pretty bad prediction for the house-capsule’s output, so the scalar product in this case will be quite small, and the raw routing weight of this predicted vector will not grow much.
  37. Next, we update the routing weights by computing the softmax of the raw weights, once again. And as you can see, the rectangle-capsule’s predicted vector for the boat-capsule now has a weight of 0.8, while it’s predicted vector for the house-capsule dropped down to 0.2. So most of its output is now going to go to the boat capsule, not the house capsule.
  38. Once again we compute the weighted sum of all the predicted output vectors for each capsule in the next layer, that is the house-capsule and the boat-capsule. And this time, the house-capsule gets so little input that its output is a tiny vector. On the other hand the boat-capsule gets so much input that it outputs a vector much longer than 1. So again we squash it.
  39. And that’s the end of round #2.
  40. And as you can see, in just a couple iterations, we have already ruled out the house and clearly chosen the boat. After perhaps one or two more rounds, we can stop and proceed to the next capsule layer in exactly the same way.
  41. So as I mentioned earlier, routing by agreement is really great to handle crowded scenes, such as the one represented in this image.
  42. One way to interpret this image (as you can see there is a bit of ambiguity), you can see a house upside down in the middle. However, if this was the case, then there would be no explanation for the bottom rectangle or the top triangle, no reason for them to be where they are.
  43. The best way to interpret the image is that there is a house at the top and a boat at the bottom. And routing by agreement will tend to choose this solution, since it makes all the capsules perfectly happy, each of them making perfect predictions for the capsules in the next layer. The ambiguity is explained away. Okay, so what can you do with a capsule network now that you know how it works?
  44. Well for one, you can create a nice image classifier of course. Just have one capsule per class in the top layer and that’s almost all there is to it. All you need to add is a layer that computes the length of the top-layer activation vectors, and this gives you the estimated class probabilities. You could then just train the network by minimizing the cross-entropy loss, as in a regular classification neural network, and you would be done.
  45. However, in the paper they use a margin loss that makes it possible to detect multiple classes in the image.
  46. So without going into too much details, this margin loss is such that if an object of class k is present in the image, then the corresponding top-level capsule should output a vector whose squared length is at least 0.9. It should be long. Conversely, if an object of class k is not present in the image, then the capsule should output a short vector, one whose squared length is shorter than 0.1. So the total loss is the sum of losses for all classes.
  47. In the paper, they also add a decoder network on top of the capsule network. It’s just 3 fully connected layers with a sigmoid activation function in the output layer. It learns to reconstruct the input image by minimizing the squared difference between the reconstructed image and the input image.
  48. The full loss is the margin loss we discussed earlier, plus the reconstruction loss (scaled down considerably so as to ensure that the margin loss dominates training). The benefit of applying this reconstruction loss is that it forces the network to preserve all the information required to reconstruct the image, up to the top layer of the capsule network, its output layer. This constraint acts a bit like a regularizer: it reduces the risk of overfitting and helps generalize to new examples. And that’s it! You know how a capsule network works, and how to train it. Let’s look a little bit at some of the figures in the paper, which I find interesting.
  49. This is figure 1 from the paper, showing a full capsule network for MNIST. You can see the first two regular convolutional layers, whose output is reshaped and squashed to get the activation vectors of the primary capsules. And these primary capsules are organized in a 6 by 6 grid, with 32 primary capsules in each cell of this grid, and each primary capsule outputs an 8-dimensional vector. So this first layer of capsules is fully connected to the 10 output capsules, which output 16 dimensional vectors. The length of these vectors is used to compute the margin loss, as explained earlier.
  50. Now this is figure 2 from the paper. It shows the decoder sitting on top of the capsnet. It is composed of 2 fully connected ReLU layers plus a fully connected sigmoid layer which outputs 784 numbers that correspond to the pixel intensities of the reconstructed image (which is a 28 by 28 pixel image). The squared difference between this reconstructed image and the input image gives the reconstruction loss.
  51. Right, and this is figure 4 from the paper. One nice thing about capsule networks is that the activation vectors are often interpretable. For example, this image shows the reconstructions that you get when you gradually modify one of the 16 dimensions of the top layer capsules’ output. You can see that the first dimension seems to represent scale and thickness. The fourth dimension represents a localized skew. The fifth represents the width of the digit plus a slight translation to get the exact position. So as you can see, it’s rather clear what most of these parameters do.
  52. Okay, to conclude, let’s summarize the pros and cons. Capsule networks have reached state of the art accuracy on MNIST. On CIFAR10, they got a bit over 10% error, which is far from state of the art, but it’s similar to what was first obtained with other techniques before years of efforts were put into them, so it’s still a good start. Capsule networks require less training data. They offer equivariance, which means that position and pose information are preserved. And this is very promising for image segmentation and object detection. The routing by agreement algorithm is great for crowded scenes. The routing tree also maps the hierarchy of objects parts, so every part is assigned to a whole. And it’s rather robust to rotations, translations and other affine transformations. The activation vectors somewhat are interpretable. And finally, obviously, it’s Hinton’s idea, so don’t bet against it.
  53. However, there are a few cons: first, as I mentioned the results are not yet state of the art on CIFAR10, even though it’s a good start. Plus, it’s still unclear whether capsule networks can scale to larger images, such as the ImageNet dataset. What will the accuracy be? Capsule networks are also quite slow to train, in large part because of the routing by agreement algorithm which has an inner loop, as you saw earlier. Finally, there is only one capsule of any given type in a given location, so it’s impossible for a capsule network to detect two objects of the same type if they are too close to one another. This is called crowding, and it has been observed in human vision as well, so it’s probably not a show-stopper.
  54. All right! I highly recommend you take a look at the code of a CapsNet implementation, such as the ones listed here (I’ll leave the links in the video description below). If you take your time, you should have no problem understanding everything the code is doing. The main difficulty in implementing CapsNets is that it contains an inner loop for the routing by agreement algorithm. Implementing loops in Keras and TensorFlow can be a little bit trickier than in PyTorch, but it can be done. If you don’t have a particular preference, then I would say that the PyTorch code is the easiest to understand.
  55. And that’s all I had, I hope you enjoyed this presentation. If you did, please visit my YouTube channel, like, share, comment, subscribe, etc. It’s my first real YouTube video, and if people find it useful, I might make some more. If you want to learn more about Machine Learning, Deep Learning and Deep Reinforcement Learning, you may want to read my O’Reilly book Hands-on Machine Learning with Scikit-Learn and TensorFlow. It covers a ton of topics, with many code examples that you will find on my github account, so I’ll leave the links in the video description. That’s all for today, have fun and see you next time!