1. Dynamic Routing Between Capsules
Sara Sabour, Nicholas Frosst, Geoffrey E Hinton, 10, 2017, Arxiv
LAB SEMINAR
1
2017.11.13
SNU DATAMINING CENTER
MINKI CHUNG
2. TABLE OF CONTENTS
▸ Intuition
▸ Problems of ConvNet
▸ How brain works, Inverse graphics
▸ Capsule Theory
▸ CapsNet
▸ Capsule
▸ CapsNet architecture
▸ Experiment
▸ Classification on MNIST
▸ Reconstruction on MNIST
▸ Dimension perturbation on MNIST
▸ Discussion
2
4. PROBLEMS OF CONVNET 4
▸ ConvNet Architecture
PROBLEMS IS ‘POOLING’
https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc
Obtain translational, rotational invariance
5. PROBLEMS OF CONVNET 5
▸
@REDDIT, MACHINE LEARNING
https://www.reddit.com/r/MachineLearning/comments/2lmo0l/ama_geoffrey_hinton/clyj4jv/
6. PROBLEMS OF CONVNET 6
▸
WHAT IS THIS PICTURE?
https://hackernoon.com/capsule-networks-are-shaking-up-ai-heres-how-to-use-them-c233a0971952
7. PROBLEMS OF CONVNET 7
▸
HOW ABOUT THIS?
https://hackernoon.com/capsule-networks-are-shaking-up-ai-heres-how-to-use-them-c233a0971952
8. PROBLEMS OF CONVNET 8
▸
NEED EQUIVARIANCE, NOT INVARIANCE
https://hackernoon.com/capsule-networks-are-shaking-up-ai-heres-how-to-use-them-c233a0971952
9. HOW BRAIN WORKS, INVERSE GRAPHICS 9
▸ Constructing a visual image from some internal hierarchical representation of
geometric data
▸ Internal representation is stored in computer’s memory as arrays of geometrical
objects and matrices that represent relative positions and orientation of these
objects
▸ Special software takes that representation and converts it into an image on the screen.
This is called rendering
▸ Brains, in fact, do the opposite of rendering. Hinton calls it inverse graphics: Visual
information received by eyes, they deconstruct a hierarchical representation of the
world around us and try to match it with already learned patterns and relationships
stored in the brain
▸ Key idea is that representation of objects in the brain does not depend on view angle
COMPUTER GRAPHICS
https://medium.com/@pechyonkin/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b
10. CAPSULE THEORY 10
▸ In 3D graphics, relationships between 3D objects can be represented by a so-
called pose, which is in essence translation plus rotation
▸ Capsule approach: It incorporates relative relationships between objects (Internal
representation) and it is represented numerically as a 4D pose matrix
▸ by ‘Dynamic Routing’ (more details later)
▸ allows capsules to communicate with each other and create representations
similar to scene graphs in computer graphics
https://medium.com/@pechyonkin/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b
YOU CAN EASILY RECOGNIZE THAT THIS IS THE STATUE OF LIBERTY,
EVEN THOUGH ALL THE IMAGES SHOW IT FROM DIFFERENT ANGLES
11. CAPSULE THEORY 11
▸ Benifits:
▸ Better understanding 3D Space
▸ Achieve state-of-the art performance by only using a fraction of the data that a CNN
would use
▸ In order to learn to tell digits apart, the human brain needs only a couple of dozens of
examples, hundreds at most, while CNN need tens of thousands of examples
https://medium.com/@pechyonkin/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b
13. CAPSULE 13
▸ Comparison with traditional neuron
https://www.zhihu.com/question/67287444/answer/251460831
V
VEC LENGTH WORKS LIKE PROBABILITY
ACTIVATION OF NEXT CAPSULE
DYNAMIC ROUTING
14. CAPSNET ARCHITECTURE 14
ARCHITECTURE
Sara Sabour, Nicholas Frosst, Geoffrey E Hinton, 10, 2017, Arxiv. Dynamic Routing Between Capsules
CONV CAPS.CONV CAPS.FC
DYNAMIC ROUTING
8X
32
X
MNIST
LOCAL FEATURE DETECTION
6*6*32=1152 CAPSULES,
EACH HAS 8 PROPERTIES
10 CAPSULES (CLASS),
EACH HAS 16 PROPERTIES
DEEPER MEANS MORE COMPLEX, DIMENSION SHOULD INCREASE
15. CAPSNET ARCHITECTURE 15
▸ naturomics github
CAPSNET-TENSORFLOW
CAPS.CONVCONV
CONV
X 32
MNIST
X 8
https://github.com/naturomics/CapsNet-Tensorflow
X 32
X 8
CAPS.FC
CAPS.CONV
CAPS.FC
DYNAMIC ROUTING
16. CAPSNET ARCHITECTURE 16
▸ Place-coded Capsule
▸ Concatenate (=8 different regular conv layers)
▸ Consider each feature map as capsule (6*6*32=1152 capsules with 8
properties)
CAPS.CONV, PRIMARYCAPS
CAPS.CONV
X 32
MNIST
X 8
https://github.com/naturomics/CapsNet-Tensorflow
DIRECTION
17. CAPSNET ARCHITECTURE 17
▸ Place-coded Capsule
▸ Concatenate (=8 different regular conv layers)
▸ Consider each feature map as capsule (6*6*32=1152 capsules with 8
properties)
▸ Use squashing function in the end
CAPS.CONV, PRIMARYCAPS
CAPS.CONV
X 32
MNIST
X 8
https://github.com/naturomics/CapsNet-Tensorflow
19. CAPSNET ARCHITECTURE 19
▸ Dynamic Routing
▸ Top-down feedback
▸ Routing by agreement
▸ Works like attention
CAPS.FC, DIGITCAPS
https://github.com/naturomics/CapsNet-Tensorflow
IF MULTIPLE PREDICTIONS
AGREE, HIGHER LEVEL CAPSULE
BECOMES ACTIVE
VEC LENGTH WORKS LIKE PROBABILITY
ACTIVATION OF NEXT CAPSULE
COUPLING COEFFICIENTS
TOPDOWN FEEDBACK: IF RELATION EXISTS COUPLING COEFFICIENTS INCREASE
AGREEMENT
20. CAPSNET ARCHITECTURE 20
▸ Dynamic Routing
CAPS.FC, DIGITCAPS
https://github.com/naturomics/CapsNet-Tensorflow
X 32
MNIST
X 8
CAPS.FC
DYNAMIC ROUTING
3 ITERATIONS WILL DO
22. EXPERIMENT 22
▸ Introduce first three
▸ Classification on MNIST (99.75%, conv 99.61%)
▸ Reconstruction on MNIST
▸ Dimension Perturbation on MNIST
▸ Robustness to Affine Transformation on MNIST (79%, conv 66%)
▸ Classification on MultiMNIST (5% error)
▸ Classification on CIFAR 10 (10.6% error - ZFNet)
▸ Classification on SVHN (4.3% error)
Sara Sabour, Nicholas Frosst, Geoffrey E Hinton, 10, 2017, Arxiv. Dynamic Routing Between Capsules
23. EXPERIMENT 23
▸ 99.75% (baseline 99.61%)
1. CLASSIFICATION ON MNIST
Sara Sabour, Nicholas Frosst, Geoffrey E Hinton, 10, 2017, Arxiv. Dynamic Routing Between Capsules
24. EXPERIMENT 24
▸
2. RECONSTRUCTION ON MNIST
Sara Sabour, Nicholas Frosst, Geoffrey E Hinton, 10, 2017, Arxiv. Dynamic Routing Between Capsules
25. EXPERIMENT 25
▸
3. DIMENSION PERTURBATION ON MNIST
Sara Sabour, Nicholas Frosst, Geoffrey E Hinton, 10, 2017, Arxiv. Dynamic Routing Between Capsules
28. _ 28
▸ Still use regular conv layer at first for local feature extraction
▸ Capsule cannot extract local feature?
STILL USE CONV LAYER
HOW TO RESTRICT TO GET CERTAIN FEATURE?
▸ Disentangling features
▸ How to obtain ‘certain features’?
30. REFERENCE
▸ Sara Sabour, Nicholas Frosst, Geoffrey E Hinton, 10, 2017, Arxiv. Dynamic Routing Between Capsules (https://
arxiv.org/abs/1710.09829)
▸ Geoffrey Hinton et al., Matrix Capsules With EM Routing, Under review as a conference paper at ICLR 2018 (https://
openreview.net/pdf?id=HJWLfGWRb)
▸ https://medium.com/@pechyonkin/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b
▸ https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc
▸ https://hackernoon.com/capsule-networks-are-shaking-up-ai-heres-how-to-use-them-c233a0971952
▸ https://github.com/naturomics/CapsNet-Tensorflow
▸ https://www.zhihu.com/question/67287444/answer/251460831
▸ https://www.reddit.com/r/MachineLearning/comments/2lmo0l/ama_geoffrey_hinton/clyj4jv/
▸ Geoffrey Hinton: "Does the Brain do Inverse Graphics?” (https://www.youtube.com/watch?
v=TFIMqt0yT2I&feature=youtu.be)
▸ Geoffrey Hinton talk "What is wrong with convolutional neural nets ?” (https://www.youtube.com/watch?
v=rTawFwUvnLE&t=1214s)
▸ https://www.youtube.com/watch?v=u50nqWMQe1k
30