Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Li Zhang
Google
Prevalent in computer vision in last 5 years
} Image classification
} Object detection
} Image segmentation
} Image captio...
More data, more compute => bigger models, better results
Okay as a cloud service
More challenging on mobile or even embedd...
} Reduce number of channels in each layer
} Low rank decomposition
◦ Speeding up convolutional neural networks with low ra...
} Cascade classifier
◦ A convolutional neural network cascade for face detection. CVPR, 2015.
◦ Exploit all the layers: Fa...
} Glimpse-based attention model
◦ Learning to combine foveal glimpses with a third-order boltzmann machine. NIPS, 2010
◦ R...
} Glimpse-based attention model
◦ Learning to combine foveal glimpses with a third-order boltzmann machine. NIPS, 2010
◦ R...
} Switch on/off subnets
◦ Dynamic capacity networks. ICML, 2016.
◦ PerforatedCNNs: Acceleration through elimination of red...
} Switch on/off subnets
◦ Dynamic capacity networks. ICML, 2016.
◦ PerforatedCNNs: Acceleration through elimination of red...
} Switch on/off subnets
◦ Dynamic capacity networks. ICML, 2016.
◦ PerforatedCNNs: Acceleration through elimination of red...
http://mscoco.org/explore/?id=19
431
https://en.wikipedia.org/wiki/Westphalian_horse
s1State
t=1
s2
F
t=2
s3
F
t=3
s4
F
t=4
s5
F
t=5
s6
F
t=6
Output: s6
Consider an RNN: output = state
https://arxiv.org/abs/...
s1State
t=1
Halting
probability
0.01
RNN-ACT
Cumulative sum: 0.01
Output: 0.01 s1 + 0.1 s2 + 0.7 s3 + 0.19 s4
Ponder cost ...
Residual block:
http://arxiv.org/abs/1512.03385
group group group
avg. pool + fc
image
https://arxiv.org/abs/1603.09382
avg. pool + fc
https://arxiv.org/abs/1603.09382
avg. pool + fc
https://arxiv.org/abs/1603.09382
avg. pool + fc
Powerful regularizer
Representations of the layers
are compatible with each other
https://arxiv.org/abs/1603.09382
avg. po...
Ponder cost : 4.7
s1
halting
probability 0.1
F1
H1
0.1
s2
F2
H2
0.1
s3
F3
H3
0.9
s4
F4
H4
group of residual blocks
s5
F5
0...
High ponder cost
Low ponder cost
ResNet-110, 𝜏 = 0.01
High ponder cost
Low ponder cost
ResNet-101, 𝜏 = 0.001
1.0 1.0
0.40.2
halting
probability
update
copy from previous
H1
group of residual blocks
F1 F3
s3s1
H2
0.70.1
F2
s2
0.6
0....
copy copy
residual
block
Strict generalization of ACT (consider zero weights for 3x3 conv)
global avg-
pooling
3x3
conv
Linear model
add
𝛔
si
hi
Two ways to train the model:
● From scratch
● Warm-up with a pretrained model (following results use this)
Important trick...
ResNet-110, 𝜏 = 0.01
ResNet-101, 𝜏 = 0.005
ResNet-101, 𝜏 = 0.005
ResNet-101, 𝜏 = 0.005
Suppose that the average number of blocks used
in the groups is 3 - 3.9 - 13.7 - 3
Baseline: train a ResNet with 3 - 4 - 1...
Apply the models to images of higher resolution than the training set
SACT improves scale invariance
● Train on ImageNet classification, fine-tune on COCO detection
● Apply ponder cost penalty to the feature extractor
Model...
cat2000 dataset
● No explicit supervision for attention!
● No center prior
Model AUC-Judd
ImageNet
SACT
𝜏 = 0.005
77%
COCO...
“Spatially Adaptive Computation Time for Residual Networks”
to appear in CVPR 2017, https://arxiv.org/pdf/1612.02297.pdf
● The idea of Adaptive Computation Time can be successfully used
for computer vision
● Adaptive Computation Time
○ Dynamic...
Adaptive Computation Time in Deep Visual Learning at AI NEXT Conference
Adaptive Computation Time in Deep Visual Learning at AI NEXT Conference
Adaptive Computation Time in Deep Visual Learning at AI NEXT Conference
Adaptive Computation Time in Deep Visual Learning at AI NEXT Conference
Adaptive Computation Time in Deep Visual Learning at AI NEXT Conference
Adaptive Computation Time in Deep Visual Learning at AI NEXT Conference
Prochain SlideShare
Chargement dans…5
×

Adaptive Computation Time in Deep Visual Learning at AI NEXT Conference

274 vues

Publié le

AI NEXT Conference 2017 Seattle by Li Zhang
Video: https://www.youtube.com/channel/UCj09XsAWj-RF9kY4UvBJh_A

Publié dans : Technologie
  • Soyez le premier à commenter

Adaptive Computation Time in Deep Visual Learning at AI NEXT Conference

  1. 1. Li Zhang Google
  2. 2. Prevalent in computer vision in last 5 years } Image classification } Object detection } Image segmentation } Image captioning } Visual question answer } Image synthesis } ...
  3. 3. More data, more compute => bigger models, better results Okay as a cloud service More challenging on mobile or even embedded devices.
  4. 4. } Reduce number of channels in each layer } Low rank decomposition ◦ Speeding up convolutional neural networks with low rank expansions, BMCV 2014 ◦ Efficient and Accurate Approximations of Nonlinear Convolutional Networks, CVPR 2015 ◦ ResNet, Inception } Reduce connections ◦ Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. ICLR, 2016. 5
  5. 5. } Cascade classifier ◦ A convolutional neural network cascade for face detection. CVPR, 2015. ◦ Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. CVPR, 2016.
  6. 6. } Glimpse-based attention model ◦ Learning to combine foveal glimpses with a third-order boltzmann machine. NIPS, 2010 ◦ Recurrent models of visual attention. NIPS, 2014. ◦ Multiple object recognition with visual attention. ICLR, 2015. ◦ Spatial transformer networks. NIPS, 2015
  7. 7. } Glimpse-based attention model ◦ Learning to combine foveal glimpses with a third-order boltzmann machine. NIPS, 2010 ◦ Recurrent models of visual attention. NIPS, 2014. ◦ Multiple object recognition with visual attention. ICLR, 2015. ◦ Spatial transformer networks. NIPS, 2015
  8. 8. } Switch on/off subnets ◦ Dynamic capacity networks. ICML, 2016. ◦ PerforatedCNNs: Acceleration through elimination of redundant convolutions. NIPS, 2016. ◦ Conditional computation in neural networks for faster models. ICLR Workshop, 2016. ◦ Branchynet: Fast inference via early exiting from deep neural networks. ICPR, 2016.
  9. 9. } Switch on/off subnets ◦ Dynamic capacity networks. ICML, 2016. ◦ PerforatedCNNs: Acceleration through elimination of redundant convolutions. NIPS, 2016. ◦ Conditional computation in neural networks for faster models. ICLR Workshop, 2016. ◦ Branchynet: Fast inference via early exiting from deep neural networks. ICPR, 2016.
  10. 10. } Switch on/off subnets ◦ Dynamic capacity networks. ICML, 2016. ◦ PerforatedCNNs: Acceleration through elimination of redundant convolutions. NIPS, 2016. ◦ Conditional computation in neural networks for faster models. ICLR Workshop, 2016. ◦ Branchynet: Fast inference via early exiting from deep neural networks. ◦ ICPR, 2016.
  11. 11. http://mscoco.org/explore/?id=19 431 https://en.wikipedia.org/wiki/Westphalian_horse
  12. 12. s1State t=1 s2 F t=2 s3 F t=3 s4 F t=4 s5 F t=5 s6 F t=6 Output: s6 Consider an RNN: output = state https://arxiv.org/abs/1603.08983
  13. 13. s1State t=1 Halting probability 0.01 RNN-ACT Cumulative sum: 0.01 Output: 0.01 s1 + 0.1 s2 + 0.7 s3 + 0.19 s4 Ponder cost : 1 + 0.19 Remainder 1 - 0.01 - 0.1 - 0.7 Differentiable w.r.t. halting probabilities! https://arxiv.org/abs/1603.08983 H s2 F t=2 0.1 + 0.1 + 1 H + 0.7 s3 F t=3 0.7 + 1 H s4 F t=4 0.5 + 0.5 > 1- 𝜀 + 1 H
  14. 14. Residual block: http://arxiv.org/abs/1512.03385 group group group avg. pool + fc image
  15. 15. https://arxiv.org/abs/1603.09382 avg. pool + fc
  16. 16. https://arxiv.org/abs/1603.09382 avg. pool + fc
  17. 17. https://arxiv.org/abs/1603.09382 avg. pool + fc
  18. 18. Powerful regularizer Representations of the layers are compatible with each other https://arxiv.org/abs/1603.09382 avg. pool + fc
  19. 19. Ponder cost : 4.7 s1 halting probability 0.1 F1 H1 0.1 s2 F2 H2 0.1 s3 F3 H3 0.9 s4 F4 H4 group of residual blocks s5 F5 0.1 s1 + 0.1 s2 + 0.1 s3 + 0.7 s4 output input si - ResNet block activation
  20. 20. High ponder cost Low ponder cost ResNet-110, 𝜏 = 0.01
  21. 21. High ponder cost Low ponder cost ResNet-101, 𝜏 = 0.001
  22. 22. 1.0 1.0 0.40.2 halting probability update copy from previous H1 group of residual blocks F1 F3 s3s1 H2 0.70.1 F2 s2 0.6 0.4 output input
  23. 23. copy copy residual block
  24. 24. Strict generalization of ACT (consider zero weights for 3x3 conv) global avg- pooling 3x3 conv Linear model add 𝛔 si hi
  25. 25. Two ways to train the model: ● From scratch ● Warm-up with a pretrained model (following results use this) Important trick: initialize biases of halting probabilities with negative values
  26. 26. ResNet-110, 𝜏 = 0.01
  27. 27. ResNet-101, 𝜏 = 0.005
  28. 28. ResNet-101, 𝜏 = 0.005
  29. 29. ResNet-101, 𝜏 = 0.005
  30. 30. Suppose that the average number of blocks used in the groups is 3 - 3.9 - 13.7 - 3 Baseline: train a ResNet with 3 - 4 - 14 - 3 blocks from scratch with “warming up” from ResNet-101 network
  31. 31. Apply the models to images of higher resolution than the training set SACT improves scale invariance
  32. 32. ● Train on ImageNet classification, fine-tune on COCO detection ● Apply ponder cost penalty to the feature extractor Model mAP Feature Extractor FLOPS ResNet v2 101 29.24 100% SACT, 𝜏 = 0.001 29.04 72.44% SACT, 𝜏 = 0.005 27.61 55.98%
  33. 33. cat2000 dataset ● No explicit supervision for attention! ● No center prior Model AUC-Judd ImageNet SACT 𝜏 = 0.005 77% COCO SACT 𝜏 = 0.005 80% One human 65% Center prior 83% State of the art 87%Kudos to Maxwell for evaluating! Middle of leaderboard http://saliency.mit.edu/home.html
  34. 34. “Spatially Adaptive Computation Time for Residual Networks” to appear in CVPR 2017, https://arxiv.org/pdf/1612.02297.pdf
  35. 35. ● The idea of Adaptive Computation Time can be successfully used for computer vision ● Adaptive Computation Time ○ Dynamic number of layers in ResNet ● Spatially Adaptive Computation Time ○ Dynamic number of layers for different parts of image ○ Attention maps for free :) ● Both models ○ Reduce the amount of computation ○ Can be implemented efficiently ○ Work on ImageNet classification (first attention models with this property?) ○ Work on COCO detection

×