SlideShare a Scribd company logo
1 of 148
Download to read offline
@DocXavi
Module 5 - Lecture 10
Deep Convnets for
Global Recognition
30 March 2016
Xavier Giró-i-Nieto
http://pagines.uab.cat/mcv/
2
Densely linked slides
3
Acknowledgments
Santi
Pascual
4
Acknowledgments
Two deep lectures in M5
5
Global Scale
(today’s lecture)
Local Scale
(next lecture)
Deep ConvNets for Recognition at...
Previously, in M3...
6Slide credit: Jose M Àlvarez
Dog
Previously, in M3...
7Slide credit: Jose M Àlvarez
Dog
Learned
Representation
Outline for this session in M5...
8
Dog
Learned
Representation
Part I: End-to-end learning (E2E)
Outline for this session in M5...
9
Learned
Representation
Part I: End-to-end learning (E2E)
Task A
(eg. image classification)
Outline for this session in M5...
10
Task A
(eg. image classification)
Learned
Representation
Part I: End-to-end learning (E2E)
Task B
(eg. image retrieval)Part II: Off-the-shelf features
Outline for this session in M5...
11
Task A
(eg. image classification)
Learned
Representation
Part I: End-to-end learning (E2E)
Task B
(eg. image retrieval)Part II: Off-the-shelf features
E2E: Classification: LeNet-5
12
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-
2324.
E2E: Classification: LeNet-5
13
Demo: 3D Visualization of a Convolutional Neural Network
Harley, Adam W. "An Interactive Node-Link Visualization of Convolutional Neural Networks." In Advances in Visual Computing, pp. 867-877. Springer
International Publishing, 2015.
E2E: Classification: Similar to LeNet-5
14
Demo: Classify MNIST digits with a Convolutional Neural Network
“ConvNetJS is a Javascript library for
training Deep Learning models (mainly
Neural Networks) entirely in your browser.
Open a tab and you're training. No software
requirements, no compilers, no installations,
no GPUs, no sweat.”
E2E: Classification: Databases
15
Li Fei-Fei, “How we’re teaching computers to understand
pictures” TEDTalks 2014.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual
recognition challenge. arXiv preprint arXiv:1409.0575. [web]
16
E2E: Classification: Databases
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual
recognition challenge. arXiv preprint arXiv:1409.0575. [web]
17
Zhou, Bolei, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. "Learning deep features for scene recognition
using places database." In Advances in neural information processing systems, pp. 487-495. 2014. [web]
E2E: Classification: Databases
● 205 scene classes (categories).
● Images:
○ 2.5M train
○ 20.5k validation
○ 41k test
18
E2E: Classification: ImageNet ILSRVC
● 100 object classes (categories).
● Images:
○ 1.2 M train
○ 100k test.
E2E: Classification: ImageNet ILSRVC
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual
recognition challenge. arXiv preprint arXiv:1409.0575. [web]
● Predict 5 classes.
Slide credit:
Rob Fergus (NYU)
Image Classifcation 2012
-9.8%
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2014). Imagenet large scale visual recognition challenge. arXiv
preprint arXiv:1409.0575. [web] 20
E2E: Classification: ILSRVC
E2E: Classification: AlexNet (Supervision)
21Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015)
Orange
A Krizhevsky, I Sutskever, GE Hinton “Imagenet classification with deep convolutional neural networks” Part of: Advances in
Neural Information Processing Systems 25 (NIPS 2012)
22Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015)
E2E: Classification: AlexNet (Supervision)
23Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015)
E2E: Classification: AlexNet (Supervision)
24Image credit: Deep learning Tutorial (Stanford University)
E2E: Classification: AlexNet (Supervision)
25Image credit: Deep learning Tutorial (Stanford University)
E2E: Classification: AlexNet (Supervision)
26Image credit: Deep learning Tutorial (Stanford University)
E2E: Classification: AlexNet (Supervision)
27
Rectified
Linear
Unit
(non-linearity)
f(x) = max(0,x)
Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015)
E2E: Classification: AlexNet (Supervision)
28
Dot Product
Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015)
E2E: Classification: AlexNet (Supervision)
ImageNet Classification 2013
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv
preprint arXiv:1409.0575. [web]
Slide credit:
Rob Fergus (NYU)
29
E2E: Classification: ImageNet ILSRVC
The development of better
convnets is reduced to trial-and-
error.
30
E2E: Classification: Visualize: ZF
Visualization can help in
proposing better architectures.
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer
International Publishing.
“A convnet model that uses the same
components (filtering, pooling) but in
reverse, so instead of mapping pixels
to features does the opposite.”
Zeiler, Matthew D., Graham W. Taylor, and Rob Fergus. "Adaptive deconvolutional networks for mid and high level feature learning." Computer Vision
(ICCV), 2011 IEEE International Conference on. IEEE, 2011.
31
E2E: Classification: Visualize: ZF
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer
International Publishing.
DeconvN
et
Conv
Net
32
E2E: Classification: Visualize: ZF
33
E2E: Classification: Visualize: ZF
34
E2E: Classification: Visualize: ZF
35
E2E: Classification: Visualize: ZF
“To examine a given convnet activation, we set all other activations in the layer to zero and pass the feature
maps as input to the attached deconvnet layer.”
36
E2E: Classification: Visualize: ZF
37
E2E: Classification: Visualize: ZF
“(i) Unpool: In the convnet, the max pooling operation is non-invertible, however we can obtain an approximate
inverse by recording the locations of the maxima within each pooling region in a set of switch variables.”
38
E2E: Classification: Visualize: ZF
XX
“(ii) Rectification: The convnet uses ReLU non-linearities, which rectify the feature maps thus ensuring the
feature maps are always positive.”
39
E2E: Classification: Visualize: ZF
“(iii) Filtering: The convnet uses learned filters to convolve the feature maps from the previous layer. To
approximately invert this, the deconvnet uses transposed versions of the same filters (as other autoencoder
models, such as RBMs), but applied to the rectified maps, not the output of the layer beneath. In practice this
means flipping each filter vertically and horizontally.
XX
XX
40
E2E: Classification: Visualize: ZF
“(iii) Filtering: The convnet uses learned filters to convolve the feature maps from the previous layer. To
approximately invert this, the deconvnet uses transposed versions of the same filters (as other autoencoder
models, such as RBMs), but applied to the rectified maps, not the output of the layer beneath. In practice this
means flipping each filter vertically and horizontally.
XX
XX
41
E2E: Classification: Visualize: ZF
42
Top 9 activations in a random subset of feature maps
across the validation data, projected down to pixel space
using our deconvolutional network approach.
Corresponding image patches.
E2E: Classification: Visualize: ZF
43
E2E: Classification: Visualize: ZF
44
E2E: Classification: Visualize: ZF
45
E2E: Classification: Visualize: ZF
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer
International Publishing.
46
E2E: Classification: Visualize: ZF
47
The smaller stride (2 vs 4) and filter size (7x7 vs 11x11) results in more distinctive features and fewer “dead"
features.
AlexNet (Layer 1) Clarifai (Layer 1)
E2E: Classification: Visualize: ZF
48
Cleaner features in Clarifai, without the aliasing artifacts caused by the stride 4 used in AlexNet.
AlexNet (Layer 2) Clarifai (Layer 2)
E2E: Classification: Visualize: ZF
49
Regularization with dropout:
Reduction of overfitting by setting
to zero the output of a portion
(typically 50%) of the each
intermediate neuron.
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of
feature detectors. arXiv preprint arXiv:1207.0580.
Chicago
E2E: Classification: Dropout: ZF
50
E2E: Classification: Visualize: ZF
51
E2E: Classification: Ensembles: ZF
ImageNet Classification 2013
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv
preprint arXiv:1409.0575. [web]
-5%
Slide credit:
Rob Fergus (NYU)
52
E2E: Classification: ImageNet ILSRVC
ImageNet Classification 2013
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv
preprint arXiv:1409.0575. [web]
-5%
Slide credit:
Rob Fergus (NYU)
53
E2E: Classification: ImageNet ILSRVC
E2E: Classification
54
E2E: Classification: GoogLeNet
55Movie: Inception (2010)
E2E: Classification: GoogLeNet
56
● 22 layers, but 12 times fewer parameters than AlexNet.
Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov,
Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions."
E2E: Classification: GoogLeNet
57
● Challenges of going deeper:
○ Overfitting, due to the increase amount of parameters.
○ Inefficient computation if most weights end up close to zero.
Solution
Sparsity
How ?
Inception modules
E2E: Classification: GoogLeNet
58
E2E: Classification: GoogLeNet
59
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
E2E: Classification: GoogLeNet
60
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
E2E: Classification: GoogLeNet (NiN)
61
3x3 and 5x5 convolutions deal with
different scales.
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides]
62
1x1 convolutions does dimensionality reduction (c3<c2) and
accounts for rectified linear units (ReLU).
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides]
E2E: Classification: GoogLeNet (NiN)
63
In NiN, the Cascaded 1x1 Convolutions compute reductions after the
convolutions.
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides]
E2E: Classification: GoogLeNet (NiN)
E2E: Classification: GoogLeNet
64
In GoogLeNet, the Cascaded 1x1 Convolutions compute reductions before the
expensive 3x3 and 5x5 convolutions.
E2E: Classification: GoogLeNet
65
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
E2E: Classification: GoogLeNet
66
3x3 max pooling introduces somewhat spatial invariance,
and has proven a benefitial effect by adding an alternative
parallel path.
E2E: Classification: GoogLeNet
67
Two Softmax Classifiers at intermediate layers combat the vanishing gradient while
providing regularization at training time.
...and no fully connected layers needed !
E2E: Classification: GoogLeNet
68NVIDIA, “NVIDIA and IBM CLoud Support ImageNet Large Scale Visual Recognition Challenge” (2015)
E2E: Classification: GoogLeNet
69
Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent
Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions." CVPR 2015. [video] [slides] [poster]
E2E: Classification: VGG
70
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition."
International Conference on Learning Representations (2015). [video] [slides] [project]
E2E: Classification: VGG
71
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition."
International Conference on Learning Representations (2015). [video] [slides] [project]
E2E: Classification: VGG: 3x3 Stacks
72
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image
recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
E2E: Classification: VGG
73
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image
recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
● No poolings between some convolutional layers.
● Convolution strides of 1 (no skipping).
E2E: Classification
74
3.6% top 5 error…
with 152 layers !!
E2E: Classification: ResNet
75
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385
(2015). [slides]
E2E: Classification: ResNet
76
● Deeper networks (34 is deeper than 18) are more difficult to train.
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385
(2015). [slides]
Thin curves: training error
Bold curves: validation error
E2E: Classification: ResNet
77
● Residual learning: reformulate the layers as learning residual functions with
reference to the layer inputs, instead of learning unreferenced functions
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385
(2015). [slides]
E2E: Classification: ResNet
78
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385
(2015). [slides]
79
E2E: Classification: Humans
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual
recognition challenge. arXiv preprint arXiv:1409.0575. [web]
80
E2E: Classification: Humans
“Is this a Border terrier” ?
Crowdsourcing
Yes
No
● Binary ground truth annotation from the crowd.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual
recognition challenge. arXiv preprint arXiv:1409.0575. [web]
81
● Annotation Problems:
Carlier, Axel, Amaia Salvador, Ferran Cabezas, Xavier Giro-i-Nieto, Vincent Charvillat, and Oge Marques. "Assessment of
crowdsourcing and gamification loss in user-assisted object segmentation." Multimedia Tools and Applications (2015): 1-28.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual
recognition challenge. arXiv preprint arXiv:1409.0575. [web]
Crowdsource loss (0.3%) More than 5 objects classes
E2E: Classification: Humans
82
Andrej Karpathy, “What I learned from competing against a computer on ImageNet” (2014)
● Test data collection from one human. [interface]
E2E: Classification: Humans
83
Andrej Karpathy, “What I learned from competing against a computer on ImageNet” (2014)
● Test data collection from one human. [interface]
“Aww, a cute dog! Would you like to spend 5 minutes scrolling through 120
breeds of dog to guess what species it is ?”
E2E: Classification: Humans
E2E: Classification: Humans
84
NVIDIA, “Mocha.jl: Deep Learning for Julia” (2015)
ResNet
8585
Let’s play a
game!
86
87
What have you seen?
88
Tower
89
Tower
House
90
Tower
House
Rocks
91
E2E: Saliency
Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB 2015)
92
Eye Tracker Mouse Click
Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB 2015)
E2E: Saliency
93
E2E: Saliency: JuntingNet
JuntingNet
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
94
JuntingNet
DATA
iSun [Xu’15]
SALICON [Jiang’15]
E2E: Saliency: JuntingNet
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
95
TRAIN VALIDATION TEST
SALICON [Jiang’15] 10,000 5,000 5,000
iSun [Xu’15] 6,000 926 2,000
CAT2000 [Borji’15] 2,000 - 2,000
MIT300 [Judd’12] 300 - -Large
Scale
E2E: Saliency: JuntingNet
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
96
ARCHITECTURE
[Pan’15]
E2E: Saliency: JuntingNet
JuntingNet
DATA
iSun [Xu’15]
SALICON [Jiang’15]
97
Upsample
+ filter
2D
map
96x96 2340=48x48
IMAGE
INPUT
(RGB)
E2E: Saliency: JuntingNet
98
Upsample
+ filter
2D
map
96x96 2340=48x48
3 CONV LAYERS
E2E: Saliency: JuntingNet
99
Upsample
+ filter
2D
map
96x96 2340=48x48
2 DENSE
LAYERS
E2E: Saliency: JuntingNet
100
Upsample
+ filter
2D
map
96x96 2340=48x48
E2E: Saliency: JuntingNet
101
JuntingNet
ARCHITECTURE
[Pan’15]
DATA
iSun [Xu’15]
SALICON [Jiang’15]
SOFTWARE
[Bergstra’10]
[Bastien’12]
E2E: Saliency: JuntingNet
102
Loss function Mean Square Error (MSE)
Weight initialization Gaussian distribution
Learning rate 0.03 to 0.0001
Mini batch size 128
Training time 7h (SALICON) / 4h (iSUN)
Acceleration SGD+ nesterov momentum (0.9)
Regularisation Maxout norm
GPU NVidia GTX 980
E2E: Saliency: JuntingNet
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
103
Number of iterations
(Training time)
● Back-propagation with the Euclidean distance.
● Training curve for the SALICON database.
E2E: Saliency: JuntingNet
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
104
JuntingNetGround TruthPixels
E2E: Saliency: JuntingNet: iSUN
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
105
JuntingNetGround TruthPixels
E2E: Saliency: JuntingNet: iSUN
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
106
Results from CVPR LSUN Challenge 2015
E2E: Saliency: JuntingNet: iSUN
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
107
JuntingNetGround TruthPixels
E2E: Saliency: JuntingNet: SALICON
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
108
JuntingNetGround TruthPixels
E2E: Saliency: JuntingNet: SALICON
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
109
Results from CVPR LSUN Challenge 2015
E2E: Saliency: JuntingNet: SALICON
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
110
E2E: Saliency: JuntingNet
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
Outline for this session in M5...
111
Part I: End-to-end learning (E2E)
Domain B
Fine-tuned
Learned
Representation
Part I’: End-to-End Fine-Tuning (FT)
Part I: End-to-end learning (E2E)
Domain ALearned
Representation
Part I: End-to-end learning (E2E)
Transfer
112
E2E: Fine-tuning
Fine-tuning a pre-trained network
Slide credit: Victor Campos, “Layer-wise CNN surgery for Visual Sentiment Prediction” (ETSETB 2015)
113
E2E: Fine-tuning
Slide credit: Victor Campos, “Layer-wise CNN surgery for Visual Sentiment Prediction” (ETSETB 2015)
Fine-tuning a pre-trained network
114
E2E: Fine-tuning: Sentiments
CNN
Campos, Victor, Amaia Salvador, Xavier Giro-i-Nieto, and Brendan Jou. "Diving Deep into Sentiment: Understanding
Fine-tuned CNNs for Visual Sentiment Prediction." In Proceedings of the 1st International Workshop on Affect &
Sentiment in Multimedia, pp. 57-62. ACM, 2015.
115
Campos, Victor, Xavier Giro-i-Nieto, and Brendan Jou. “From pixels to sentiments” (Submitted)
Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully Convolutional Networks for Semantic
Segmentation." CVPR 2015
E2E: Fine-tuning: Sentiments
True positive True negative False positive False negative
Visualizations with fully convolutional networks.
116
E2E: Fine-tuning: Cultural events
ChaLearn Workshop
A. Salvador, Zeppelzauer, M., Manchon-Vizuete, D., Calafell-Orós, A., and Giró-i-Nieto, X., “Cultural
Event Recognition with Visual ConvNets and Temporal Models”, in CVPR ChaLearn Looking at People
Workshop 2015, 2015. [slides]
Outline for this session in M5...
117
Task A
(eg. image classification)
Learned
Representation
Part I: End-to-end learning (E2E)
Task B
(eg. image retrieval)
Part II: Off-The-Shelf features (OTS)
118
Razavian, Ali, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. "CNN features off-the-shelf:
an astounding baseline for recognition." CVPRW 2014
Off-The-Shelf (OTS) Features
● Intermediate features can be used as regular visual descriptors for any task.
119
Off-The-Shelf (OTS) Features
Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014
120
OTS: Classification: Razavian
Razavian, Ali, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. "CNN features off-the-shelf:
an astounding baseline for recognition." CVPRW 2014
Pascal VOC 2007
121
OTS: Classification: Return of devil
Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving
deep into convolutional nets. BMVC 2014
Classifier
L2-normalization
Accuracy
+5%
Three representative architectures considered:
AlexNet
ZF
OverFeat
5 days (fast)
3 weeks (slow)
@ NVIDIA GTX Titan GPU
122
OTS: Classification: Return of devil
Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving
deep into convolutional nets. BMVC 2014
F
C
123
OTS: Classification: Return of devil
Data augmentation
Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving
deep into convolutional nets. BMVC 2014
124
OTS: Classification: Return of devil
Fisher
Kernels
(FK)
ConvNets
(CNN)
Color
Gray Scale (GS)
Accuracy
-2.5%
125
OTS: Classification: Return of devil
Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving
deep into convolutional nets. BMVC 2014
Dimensionality reduction by retraining the last layer to smaller sizes.
Accuracy
-2%
Size
x32
126
OTS: Classification: Return of devil
Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving
deep into convolutional nets. BMVC 2014
Ranking
Summary of the paper by Amaia Salvador on Bitsearch.
Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014. Springer
International Publishing, 2014. 584-599.
127
OTS: Retrieval
Oxford Buildings Inria Holidays UKB
128
OTS: Retrieval
Pooled from the network from Krizhevsky et. al. pretrained with images from
ImageNet.
129
OTS: Retrieval: FC layers
Summary of the paper by Amaia Salvador on Bitsearch.
Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014.
Off-the-shelf CNN descriptors from fully connected layers show useful but not
superior (w.r.t. FV, VLAD, Sparse Coding,...)
130Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014.
OTS: Retrieval: FC layers
Razavian et al, A baseline for visual instance retrieval with deep convolutional networks, ICLR 2015.
131
Convolutional layers have shown better performance than fully connected ones.
OTS: Retrieval: Conv layers
OTS: Retrieval: Conv layers
Razavian et al, A baseline for visual instance retrieval with deep convolutional networks, ICLR 2015.
132
Spatial Search, (extract N local descriptor from predefined locations) increases
performance at computational cost.
Medium memory footprints
Razavian et al, A baseline for visual instance retrieval with deep convolutional networks, ICLR 2015.
133
OTS: Retrieval: Conv layers
Mohedano et al, Bag of Convolutional Local Features for Scalable Instance Search (submitted)
134
...
Instance Retrieval
(Instance: Object, Building, Person, Place…)
OTS: Retrieval: Bag of Words (BoW)
Mohedano et al, Bags of Local Convolutional Features for Scalable Instance Search (ICMR 2016)
135
OTS: Retrieval: Bag of Words (BoW)
An assignment map of BoF quantized features can be defined over the spatial
locations of the convolutional feature maps.
136Mohedano et al, Bag of Convolutional Local Features for Scalable Instance Search (submitted)
OTS: Retrieval: Bag of Words (BoW)
137
OTS: Summarization
Bolaños M, Mestre R, Talavera E, Giró-i-Nieto X, Radeva P. Visual Summary of Egocentric Photostreams by Representative
Keyframes. In: IEEE International Workshop on Wearable and Ego-vision Systems for Augmented Experience (WEsAX) 2015. Turin, Italy:
2015
Clustering based on Euclidean distance over FC7 features from AlexNet.
138
OTS: Reinforcement learning: DeepQ
Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra,
and Martin Riedmiller. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602
(2013).
(Google) DeepMind
139
OTS: Reinforcement learning: Deep Q
Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin
Riedmiller. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).
Demo: Deep Q learning Demo
140
OTS: Reinforcement learning
Caicedo, Juan C., and Svetlana Lazebnik. "Active object localization with deep reinforcement learning."
ICCV 2015 [Slides by Miriam Bellver]
Other: Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. "Playing
atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).
Object is localized based on visual features from AlexNet FC6.
ConvNets: Learn more
141
The New York Times: “The Race Is On to Control Artificial Intelligence, and Tech’s
Future” (25/03/2016)
142
Keras http://keras.io/
Caffe http://caffe.berkeleyvision.org/
Torch (Overfeat) http://torch.ch/
Theano http://deeplearning.net/software/theano/
Tensor Flow https://www.tensorflow.org/
MatconvNet (VLFeat) http://www.vlfeat.org/matconvnet/
CNTK (Mcrosoft) http://www.cntk.ai/
ConvNets: Frameworks
143
ConvNets: Frameworks
Source: @fchollet
Stanford course:
CS231n:
Convolutional Neural
Networks for Visual
Recognition
144
ConvNets: Learn more
145
● Reading Group on Tuesdays (UPC) and Wednesdays (UB) at 11am.
Schedule, list of paper, slides & video(s): https://github.com/imatge-upc/readcv
ConvNets: Learn more
Open to MCV students.
E-mail me at xavier.
giro@upc.edu
before attending :)
146
● Summer Course “Deep learning for computer vision” (4-8 July 2016)
(Very) temptative program at: https://github.com/imatge-upc/etsetb-2016-dlcv
ConvNets: Learn more
E-mail me at xavier.
giro@upc.edu for a
free spot
(subject to
availability)
147
● Broader and more friendly slides for dissemination.
[Available on slideshare.net]
ConvNets: Learn more
148
Thank you !
https://imatge.upc.edu/web/people/xavier-giro
https://twitter.com/DocXavi
https://www.facebook.com/ProfessorXavi
xavier.giro@upc.edu
Xavier Giró-i-Nieto

More Related Content

What's hot

AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...GeeksLab Odessa
 
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural NetworksTemporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural NetworksUniversitat Politècnica de Catalunya
 
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Universitat Politècnica de Catalunya
 
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaDeep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaUniversitat Politècnica de Catalunya
 
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...Universitat Politècnica de Catalunya
 
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...Universitat Politècnica de Catalunya
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksMark Scully
 
DeepFix: a fully convolutional neural network for predicting human fixations...
DeepFix:  a fully convolutional neural network for predicting human fixations...DeepFix:  a fully convolutional neural network for predicting human fixations...
DeepFix: a fully convolutional neural network for predicting human fixations...Universitat Politècnica de Catalunya
 
Devil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesDevil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesKen Chatfield
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya
 
Yann le cun
Yann le cunYann le cun
Yann le cunYandex
 
Promises of Deep Learning
Promises of Deep LearningPromises of Deep Learning
Promises of Deep LearningDavid Khosid
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamScene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamWithTheBest
 
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...Universitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 

What's hot (20)

AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
 
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural NetworksTemporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
 
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
 
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
 
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaDeep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
 
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
 
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
 
Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)
 
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural Networks
 
DeepFix: a fully convolutional neural network for predicting human fixations...
DeepFix:  a fully convolutional neural network for predicting human fixations...DeepFix:  a fully convolutional neural network for predicting human fixations...
DeepFix: a fully convolutional neural network for predicting human fixations...
 
Devil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesDevil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet Features
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
 
Yann le cun
Yann le cunYann le cun
Yann le cun
 
Promises of Deep Learning
Promises of Deep LearningPromises of Deep Learning
Promises of Deep Learning
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamScene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
 
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
 

Viewers also liked

Case Study of Convolutional Neural Network
Case Study of Convolutional Neural NetworkCase Study of Convolutional Neural Network
Case Study of Convolutional Neural NetworkNamHyuk Ahn
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015Jia-Bin Huang
 
Backpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural NetworkBackpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural NetworkHiroshi Kuwajima
 
Paper overview: "Deep Residual Learning for Image Recognition"
Paper overview: "Deep Residual Learning for Image Recognition"Paper overview: "Deep Residual Learning for Image Recognition"
Paper overview: "Deep Residual Learning for Image Recognition"Ilya Kuzovkin
 
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...Universitat Politècnica de Catalunya
 
20150703.journal club
20150703.journal club20150703.journal club
20150703.journal clubHayaru SHOUNO
 
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Universitat Politècnica de Catalunya
 
Shuffle and learn: Unsupervised Learning using Temporal Order Verification (U...
Shuffle and learn: Unsupervised Learning using Temporal Order Verification (U...Shuffle and learn: Unsupervised Learning using Temporal Order Verification (U...
Shuffle and learn: Unsupervised Learning using Temporal Order Verification (U...Universitat Politècnica de Catalunya
 
Convolutional neural network in practice
Convolutional neural network in practiceConvolutional neural network in practice
Convolutional neural network in practice남주 김
 
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...Universitat Politècnica de Catalunya
 
Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model (UP...
Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model (UP...Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model (UP...
Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model (UP...Universitat Politècnica de Catalunya
 
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition Intel Nervana
 
cvpaper.challenge@CVPR2015(Deep Neural Networks)
cvpaper.challenge@CVPR2015(Deep Neural Networks)cvpaper.challenge@CVPR2015(Deep Neural Networks)
cvpaper.challenge@CVPR2015(Deep Neural Networks)cvpaper. challenge
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...Universitat Politècnica de Catalunya
 

Viewers also liked (19)

Case Study of Convolutional Neural Network
Case Study of Convolutional Neural NetworkCase Study of Convolutional Neural Network
Case Study of Convolutional Neural Network
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
 
Backpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural NetworkBackpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural Network
 
Paper overview: "Deep Residual Learning for Image Recognition"
Paper overview: "Deep Residual Learning for Image Recognition"Paper overview: "Deep Residual Learning for Image Recognition"
Paper overview: "Deep Residual Learning for Image Recognition"
 
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
 
20150703.journal club
20150703.journal club20150703.journal club
20150703.journal club
 
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
 
Shuffle and learn: Unsupervised Learning using Temporal Order Verification (U...
Shuffle and learn: Unsupervised Learning using Temporal Order Verification (U...Shuffle and learn: Unsupervised Learning using Temporal Order Verification (U...
Shuffle and learn: Unsupervised Learning using Temporal Order Verification (U...
 
The impact of visual saliency prediction in image classification
The impact of visual saliency prediction in image classificationThe impact of visual saliency prediction in image classification
The impact of visual saliency prediction in image classification
 
Convolutional neural network in practice
Convolutional neural network in practiceConvolutional neural network in practice
Convolutional neural network in practice
 
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
 
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
 
Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model (UP...
Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model (UP...Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model (UP...
Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model (UP...
 
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition
 
Deep Learning for Computer Vision: Visualization (UPC 2016)
Deep Learning for Computer Vision: Visualization (UPC 2016)Deep Learning for Computer Vision: Visualization (UPC 2016)
Deep Learning for Computer Vision: Visualization (UPC 2016)
 
cvpaper.challenge@CVPR2015(Deep Neural Networks)
cvpaper.challenge@CVPR2015(Deep Neural Networks)cvpaper.challenge@CVPR2015(Deep Neural Networks)
cvpaper.challenge@CVPR2015(Deep Neural Networks)
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
 

Similar to Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)

Fcv learn yu
Fcv learn yuFcv learn yu
Fcv learn yuzukun
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Dmytro Mishkin
 
Multimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAMultimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAJin-Hwa Kim
 
#4 Convolutional Neural Networks for Natural Language Processing
#4 Convolutional Neural Networks for Natural Language Processing#4 Convolutional Neural Networks for Natural Language Processing
#4 Convolutional Neural Networks for Natural Language ProcessingBerlin Language Technology
 
What's Wrong With Deep Learning?
What's Wrong With Deep Learning?What's Wrong With Deep Learning?
What's Wrong With Deep Learning?Philip Zheng
 
Deep learning in Computer Vision
Deep learning in Computer VisionDeep learning in Computer Vision
Deep learning in Computer VisionDavid Dao
 
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Universitat Politècnica de Catalunya
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureRouyun Pan
 
Inspirational applications of deep learning
Inspirational applications of deep learningInspirational applications of deep learning
Inspirational applications of deep learningssh1
 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxNoorUlHaq47
 
Visual Object Tracking: Action-Decision Networks (ADNets)
Visual Object Tracking: Action-Decision Networks (ADNets)Visual Object Tracking: Action-Decision Networks (ADNets)
Visual Object Tracking: Action-Decision Networks (ADNets)Yehya Abouelnaga
 
Visualizing and Understanding Convolutional Networks
Visualizing and Understanding Convolutional NetworksVisualizing and Understanding Convolutional Networks
Visualizing and Understanding Convolutional NetworksWilly Marroquin (WillyDevNET)
 
CNN Structure: From LeNet to ShuffleNet
CNN Structure: From LeNet to ShuffleNetCNN Structure: From LeNet to ShuffleNet
CNN Structure: From LeNet to ShuffleNetDalin Zhang
 

Similar to Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) (20)

Fcv learn yu
Fcv learn yuFcv learn yu
Fcv learn yu
 
med_poster_spie
med_poster_spiemed_poster_spie
med_poster_spie
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
 
Multimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAMultimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QA
 
convnets.pptx
convnets.pptxconvnets.pptx
convnets.pptx
 
#4 Convolutional Neural Networks for Natural Language Processing
#4 Convolutional Neural Networks for Natural Language Processing#4 Convolutional Neural Networks for Natural Language Processing
#4 Convolutional Neural Networks for Natural Language Processing
 
What's Wrong With Deep Learning?
What's Wrong With Deep Learning?What's Wrong With Deep Learning?
What's Wrong With Deep Learning?
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
conv_nets.pptx
conv_nets.pptxconv_nets.pptx
conv_nets.pptx
 
Lec11 object-re-id
Lec11 object-re-idLec11 object-re-id
Lec11 object-re-id
 
Deep learning in Computer Vision
Deep learning in Computer VisionDeep learning in Computer Vision
Deep learning in Computer Vision
 
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 
Convolutional neural networks
Convolutional neural  networksConvolutional neural  networks
Convolutional neural networks
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & Future
 
Inspirational applications of deep learning
Inspirational applications of deep learningInspirational applications of deep learning
Inspirational applications of deep learning
 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptx
 
Visual Object Tracking: Action-Decision Networks (ADNets)
Visual Object Tracking: Action-Decision Networks (ADNets)Visual Object Tracking: Action-Decision Networks (ADNets)
Visual Object Tracking: Action-Decision Networks (ADNets)
 
Visualizing and Understanding Convolutional Networks
Visualizing and Understanding Convolutional NetworksVisualizing and Understanding Convolutional Networks
Visualizing and Understanding Convolutional Networks
 
CNN Structure: From LeNet to ShuffleNet
CNN Structure: From LeNet to ShuffleNetCNN Structure: From LeNet to ShuffleNet
CNN Structure: From LeNet to ShuffleNet
 

More from Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Universitat Politècnica de Catalunya
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Universitat Politècnica de Catalunya
 

More from Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
 

Recently uploaded

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)

  • 1. @DocXavi Module 5 - Lecture 10 Deep Convnets for Global Recognition 30 March 2016 Xavier Giró-i-Nieto http://pagines.uab.cat/mcv/
  • 5. Two deep lectures in M5 5 Global Scale (today’s lecture) Local Scale (next lecture) Deep ConvNets for Recognition at...
  • 6. Previously, in M3... 6Slide credit: Jose M Àlvarez Dog
  • 7. Previously, in M3... 7Slide credit: Jose M Àlvarez Dog Learned Representation
  • 8. Outline for this session in M5... 8 Dog Learned Representation Part I: End-to-end learning (E2E)
  • 9. Outline for this session in M5... 9 Learned Representation Part I: End-to-end learning (E2E) Task A (eg. image classification)
  • 10. Outline for this session in M5... 10 Task A (eg. image classification) Learned Representation Part I: End-to-end learning (E2E) Task B (eg. image retrieval)Part II: Off-the-shelf features
  • 11. Outline for this session in M5... 11 Task A (eg. image classification) Learned Representation Part I: End-to-end learning (E2E) Task B (eg. image retrieval)Part II: Off-the-shelf features
  • 12. E2E: Classification: LeNet-5 12 LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278- 2324.
  • 13. E2E: Classification: LeNet-5 13 Demo: 3D Visualization of a Convolutional Neural Network Harley, Adam W. "An Interactive Node-Link Visualization of Convolutional Neural Networks." In Advances in Visual Computing, pp. 867-877. Springer International Publishing, 2015.
  • 14. E2E: Classification: Similar to LeNet-5 14 Demo: Classify MNIST digits with a Convolutional Neural Network “ConvNetJS is a Javascript library for training Deep Learning models (mainly Neural Networks) entirely in your browser. Open a tab and you're training. No software requirements, no compilers, no installations, no GPUs, no sweat.”
  • 15. E2E: Classification: Databases 15 Li Fei-Fei, “How we’re teaching computers to understand pictures” TEDTalks 2014. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]
  • 16. 16 E2E: Classification: Databases Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]
  • 17. 17 Zhou, Bolei, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. "Learning deep features for scene recognition using places database." In Advances in neural information processing systems, pp. 487-495. 2014. [web] E2E: Classification: Databases ● 205 scene classes (categories). ● Images: ○ 2.5M train ○ 20.5k validation ○ 41k test
  • 18. 18 E2E: Classification: ImageNet ILSRVC ● 100 object classes (categories). ● Images: ○ 1.2 M train ○ 100k test.
  • 19. E2E: Classification: ImageNet ILSRVC Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] ● Predict 5 classes.
  • 20. Slide credit: Rob Fergus (NYU) Image Classifcation 2012 -9.8% Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2014). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] 20 E2E: Classification: ILSRVC
  • 21. E2E: Classification: AlexNet (Supervision) 21Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) Orange A Krizhevsky, I Sutskever, GE Hinton “Imagenet classification with deep convolutional neural networks” Part of: Advances in Neural Information Processing Systems 25 (NIPS 2012)
  • 22. 22Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) E2E: Classification: AlexNet (Supervision)
  • 23. 23Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) E2E: Classification: AlexNet (Supervision)
  • 24. 24Image credit: Deep learning Tutorial (Stanford University) E2E: Classification: AlexNet (Supervision)
  • 25. 25Image credit: Deep learning Tutorial (Stanford University) E2E: Classification: AlexNet (Supervision)
  • 26. 26Image credit: Deep learning Tutorial (Stanford University) E2E: Classification: AlexNet (Supervision)
  • 27. 27 Rectified Linear Unit (non-linearity) f(x) = max(0,x) Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) E2E: Classification: AlexNet (Supervision)
  • 28. 28 Dot Product Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) E2E: Classification: AlexNet (Supervision)
  • 29. ImageNet Classification 2013 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] Slide credit: Rob Fergus (NYU) 29 E2E: Classification: ImageNet ILSRVC
  • 30. The development of better convnets is reduced to trial-and- error. 30 E2E: Classification: Visualize: ZF Visualization can help in proposing better architectures. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing.
  • 31. “A convnet model that uses the same components (filtering, pooling) but in reverse, so instead of mapping pixels to features does the opposite.” Zeiler, Matthew D., Graham W. Taylor, and Rob Fergus. "Adaptive deconvolutional networks for mid and high level feature learning." Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011. 31 E2E: Classification: Visualize: ZF
  • 32. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing. DeconvN et Conv Net 32 E2E: Classification: Visualize: ZF
  • 36. “To examine a given convnet activation, we set all other activations in the layer to zero and pass the feature maps as input to the attached deconvnet layer.” 36 E2E: Classification: Visualize: ZF
  • 38. “(i) Unpool: In the convnet, the max pooling operation is non-invertible, however we can obtain an approximate inverse by recording the locations of the maxima within each pooling region in a set of switch variables.” 38 E2E: Classification: Visualize: ZF
  • 39. XX “(ii) Rectification: The convnet uses ReLU non-linearities, which rectify the feature maps thus ensuring the feature maps are always positive.” 39 E2E: Classification: Visualize: ZF
  • 40. “(iii) Filtering: The convnet uses learned filters to convolve the feature maps from the previous layer. To approximately invert this, the deconvnet uses transposed versions of the same filters (as other autoencoder models, such as RBMs), but applied to the rectified maps, not the output of the layer beneath. In practice this means flipping each filter vertically and horizontally. XX XX 40 E2E: Classification: Visualize: ZF
  • 41. “(iii) Filtering: The convnet uses learned filters to convolve the feature maps from the previous layer. To approximately invert this, the deconvnet uses transposed versions of the same filters (as other autoencoder models, such as RBMs), but applied to the rectified maps, not the output of the layer beneath. In practice this means flipping each filter vertically and horizontally. XX XX 41 E2E: Classification: Visualize: ZF
  • 42. 42 Top 9 activations in a random subset of feature maps across the validation data, projected down to pixel space using our deconvolutional network approach. Corresponding image patches. E2E: Classification: Visualize: ZF
  • 46. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing. 46 E2E: Classification: Visualize: ZF
  • 47. 47 The smaller stride (2 vs 4) and filter size (7x7 vs 11x11) results in more distinctive features and fewer “dead" features. AlexNet (Layer 1) Clarifai (Layer 1) E2E: Classification: Visualize: ZF
  • 48. 48 Cleaner features in Clarifai, without the aliasing artifacts caused by the stride 4 used in AlexNet. AlexNet (Layer 2) Clarifai (Layer 2) E2E: Classification: Visualize: ZF
  • 49. 49 Regularization with dropout: Reduction of overfitting by setting to zero the output of a portion (typically 50%) of the each intermediate neuron. Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580. Chicago E2E: Classification: Dropout: ZF
  • 52. ImageNet Classification 2013 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] -5% Slide credit: Rob Fergus (NYU) 52 E2E: Classification: ImageNet ILSRVC
  • 53. ImageNet Classification 2013 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] -5% Slide credit: Rob Fergus (NYU) 53 E2E: Classification: ImageNet ILSRVC
  • 56. E2E: Classification: GoogLeNet 56 ● 22 layers, but 12 times fewer parameters than AlexNet. Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions."
  • 57. E2E: Classification: GoogLeNet 57 ● Challenges of going deeper: ○ Overfitting, due to the increase amount of parameters. ○ Inefficient computation if most weights end up close to zero. Solution Sparsity How ? Inception modules
  • 59. E2E: Classification: GoogLeNet 59 Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
  • 60. E2E: Classification: GoogLeNet 60 Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
  • 61. E2E: Classification: GoogLeNet (NiN) 61 3x3 and 5x5 convolutions deal with different scales. Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides]
  • 62. 62 1x1 convolutions does dimensionality reduction (c3<c2) and accounts for rectified linear units (ReLU). Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides] E2E: Classification: GoogLeNet (NiN)
  • 63. 63 In NiN, the Cascaded 1x1 Convolutions compute reductions after the convolutions. Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides] E2E: Classification: GoogLeNet (NiN)
  • 64. E2E: Classification: GoogLeNet 64 In GoogLeNet, the Cascaded 1x1 Convolutions compute reductions before the expensive 3x3 and 5x5 convolutions.
  • 65. E2E: Classification: GoogLeNet 65 Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
  • 66. E2E: Classification: GoogLeNet 66 3x3 max pooling introduces somewhat spatial invariance, and has proven a benefitial effect by adding an alternative parallel path.
  • 67. E2E: Classification: GoogLeNet 67 Two Softmax Classifiers at intermediate layers combat the vanishing gradient while providing regularization at training time. ...and no fully connected layers needed !
  • 68. E2E: Classification: GoogLeNet 68NVIDIA, “NVIDIA and IBM CLoud Support ImageNet Large Scale Visual Recognition Challenge” (2015)
  • 69. E2E: Classification: GoogLeNet 69 Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions." CVPR 2015. [video] [slides] [poster]
  • 70. E2E: Classification: VGG 70 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
  • 71. E2E: Classification: VGG 71 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
  • 72. E2E: Classification: VGG: 3x3 Stacks 72 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
  • 73. E2E: Classification: VGG 73 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project] ● No poolings between some convolutional layers. ● Convolution strides of 1 (no skipping).
  • 74. E2E: Classification 74 3.6% top 5 error… with 152 layers !!
  • 75. E2E: Classification: ResNet 75 He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides]
  • 76. E2E: Classification: ResNet 76 ● Deeper networks (34 is deeper than 18) are more difficult to train. He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides] Thin curves: training error Bold curves: validation error
  • 77. E2E: Classification: ResNet 77 ● Residual learning: reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides]
  • 78. E2E: Classification: ResNet 78 He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides]
  • 79. 79 E2E: Classification: Humans Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]
  • 80. 80 E2E: Classification: Humans “Is this a Border terrier” ? Crowdsourcing Yes No ● Binary ground truth annotation from the crowd. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]
  • 81. 81 ● Annotation Problems: Carlier, Axel, Amaia Salvador, Ferran Cabezas, Xavier Giro-i-Nieto, Vincent Charvillat, and Oge Marques. "Assessment of crowdsourcing and gamification loss in user-assisted object segmentation." Multimedia Tools and Applications (2015): 1-28. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] Crowdsource loss (0.3%) More than 5 objects classes E2E: Classification: Humans
  • 82. 82 Andrej Karpathy, “What I learned from competing against a computer on ImageNet” (2014) ● Test data collection from one human. [interface] E2E: Classification: Humans
  • 83. 83 Andrej Karpathy, “What I learned from competing against a computer on ImageNet” (2014) ● Test data collection from one human. [interface] “Aww, a cute dog! Would you like to spend 5 minutes scrolling through 120 breeds of dog to guess what species it is ?” E2E: Classification: Humans
  • 84. E2E: Classification: Humans 84 NVIDIA, “Mocha.jl: Deep Learning for Julia” (2015) ResNet
  • 86. 86
  • 91. 91 E2E: Saliency Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB 2015)
  • 92. 92 Eye Tracker Mouse Click Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB 2015) E2E: Saliency
  • 93. 93 E2E: Saliency: JuntingNet JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 94. 94 JuntingNet DATA iSun [Xu’15] SALICON [Jiang’15] E2E: Saliency: JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 95. 95 TRAIN VALIDATION TEST SALICON [Jiang’15] 10,000 5,000 5,000 iSun [Xu’15] 6,000 926 2,000 CAT2000 [Borji’15] 2,000 - 2,000 MIT300 [Judd’12] 300 - -Large Scale E2E: Saliency: JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 98. 98 Upsample + filter 2D map 96x96 2340=48x48 3 CONV LAYERS E2E: Saliency: JuntingNet
  • 99. 99 Upsample + filter 2D map 96x96 2340=48x48 2 DENSE LAYERS E2E: Saliency: JuntingNet
  • 102. 102 Loss function Mean Square Error (MSE) Weight initialization Gaussian distribution Learning rate 0.03 to 0.0001 Mini batch size 128 Training time 7h (SALICON) / 4h (iSUN) Acceleration SGD+ nesterov momentum (0.9) Regularisation Maxout norm GPU NVidia GTX 980 E2E: Saliency: JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 103. 103 Number of iterations (Training time) ● Back-propagation with the Euclidean distance. ● Training curve for the SALICON database. E2E: Saliency: JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 104. 104 JuntingNetGround TruthPixels E2E: Saliency: JuntingNet: iSUN Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 105. 105 JuntingNetGround TruthPixels E2E: Saliency: JuntingNet: iSUN Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 106. 106 Results from CVPR LSUN Challenge 2015 E2E: Saliency: JuntingNet: iSUN Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 107. 107 JuntingNetGround TruthPixels E2E: Saliency: JuntingNet: SALICON Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 108. 108 JuntingNetGround TruthPixels E2E: Saliency: JuntingNet: SALICON Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 109. 109 Results from CVPR LSUN Challenge 2015 E2E: Saliency: JuntingNet: SALICON Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 110. 110 E2E: Saliency: JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 111. Outline for this session in M5... 111 Part I: End-to-end learning (E2E) Domain B Fine-tuned Learned Representation Part I’: End-to-End Fine-Tuning (FT) Part I: End-to-end learning (E2E) Domain ALearned Representation Part I: End-to-end learning (E2E) Transfer
  • 112. 112 E2E: Fine-tuning Fine-tuning a pre-trained network Slide credit: Victor Campos, “Layer-wise CNN surgery for Visual Sentiment Prediction” (ETSETB 2015)
  • 113. 113 E2E: Fine-tuning Slide credit: Victor Campos, “Layer-wise CNN surgery for Visual Sentiment Prediction” (ETSETB 2015) Fine-tuning a pre-trained network
  • 114. 114 E2E: Fine-tuning: Sentiments CNN Campos, Victor, Amaia Salvador, Xavier Giro-i-Nieto, and Brendan Jou. "Diving Deep into Sentiment: Understanding Fine-tuned CNNs for Visual Sentiment Prediction." In Proceedings of the 1st International Workshop on Affect & Sentiment in Multimedia, pp. 57-62. ACM, 2015.
  • 115. 115 Campos, Victor, Xavier Giro-i-Nieto, and Brendan Jou. “From pixels to sentiments” (Submitted) Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully Convolutional Networks for Semantic Segmentation." CVPR 2015 E2E: Fine-tuning: Sentiments True positive True negative False positive False negative Visualizations with fully convolutional networks.
  • 116. 116 E2E: Fine-tuning: Cultural events ChaLearn Workshop A. Salvador, Zeppelzauer, M., Manchon-Vizuete, D., Calafell-Orós, A., and Giró-i-Nieto, X., “Cultural Event Recognition with Visual ConvNets and Temporal Models”, in CVPR ChaLearn Looking at People Workshop 2015, 2015. [slides]
  • 117. Outline for this session in M5... 117 Task A (eg. image classification) Learned Representation Part I: End-to-end learning (E2E) Task B (eg. image retrieval) Part II: Off-The-Shelf features (OTS)
  • 118. 118 Razavian, Ali, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. "CNN features off-the-shelf: an astounding baseline for recognition." CVPRW 2014 Off-The-Shelf (OTS) Features
  • 119. ● Intermediate features can be used as regular visual descriptors for any task. 119 Off-The-Shelf (OTS) Features Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014
  • 120. 120 OTS: Classification: Razavian Razavian, Ali, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. "CNN features off-the-shelf: an astounding baseline for recognition." CVPRW 2014 Pascal VOC 2007
  • 121. 121 OTS: Classification: Return of devil Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving deep into convolutional nets. BMVC 2014 Classifier L2-normalization Accuracy +5%
  • 122. Three representative architectures considered: AlexNet ZF OverFeat 5 days (fast) 3 weeks (slow) @ NVIDIA GTX Titan GPU 122 OTS: Classification: Return of devil Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving deep into convolutional nets. BMVC 2014
  • 123. F C 123 OTS: Classification: Return of devil Data augmentation Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving deep into convolutional nets. BMVC 2014
  • 124. 124 OTS: Classification: Return of devil Fisher Kernels (FK) ConvNets (CNN)
  • 125. Color Gray Scale (GS) Accuracy -2.5% 125 OTS: Classification: Return of devil Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving deep into convolutional nets. BMVC 2014
  • 126. Dimensionality reduction by retraining the last layer to smaller sizes. Accuracy -2% Size x32 126 OTS: Classification: Return of devil Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving deep into convolutional nets. BMVC 2014
  • 127. Ranking Summary of the paper by Amaia Salvador on Bitsearch. Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014. Springer International Publishing, 2014. 584-599. 127 OTS: Retrieval
  • 128. Oxford Buildings Inria Holidays UKB 128 OTS: Retrieval
  • 129. Pooled from the network from Krizhevsky et. al. pretrained with images from ImageNet. 129 OTS: Retrieval: FC layers Summary of the paper by Amaia Salvador on Bitsearch. Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014.
  • 130. Off-the-shelf CNN descriptors from fully connected layers show useful but not superior (w.r.t. FV, VLAD, Sparse Coding,...) 130Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014. OTS: Retrieval: FC layers
  • 131. Razavian et al, A baseline for visual instance retrieval with deep convolutional networks, ICLR 2015. 131 Convolutional layers have shown better performance than fully connected ones. OTS: Retrieval: Conv layers
  • 132. OTS: Retrieval: Conv layers Razavian et al, A baseline for visual instance retrieval with deep convolutional networks, ICLR 2015. 132 Spatial Search, (extract N local descriptor from predefined locations) increases performance at computational cost.
  • 133. Medium memory footprints Razavian et al, A baseline for visual instance retrieval with deep convolutional networks, ICLR 2015. 133 OTS: Retrieval: Conv layers
  • 134. Mohedano et al, Bag of Convolutional Local Features for Scalable Instance Search (submitted) 134 ... Instance Retrieval (Instance: Object, Building, Person, Place…) OTS: Retrieval: Bag of Words (BoW)
  • 135. Mohedano et al, Bags of Local Convolutional Features for Scalable Instance Search (ICMR 2016) 135 OTS: Retrieval: Bag of Words (BoW) An assignment map of BoF quantized features can be defined over the spatial locations of the convolutional feature maps.
  • 136. 136Mohedano et al, Bag of Convolutional Local Features for Scalable Instance Search (submitted) OTS: Retrieval: Bag of Words (BoW)
  • 137. 137 OTS: Summarization Bolaños M, Mestre R, Talavera E, Giró-i-Nieto X, Radeva P. Visual Summary of Egocentric Photostreams by Representative Keyframes. In: IEEE International Workshop on Wearable and Ego-vision Systems for Augmented Experience (WEsAX) 2015. Turin, Italy: 2015 Clustering based on Euclidean distance over FC7 features from AlexNet.
  • 138. 138 OTS: Reinforcement learning: DeepQ Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013). (Google) DeepMind
  • 139. 139 OTS: Reinforcement learning: Deep Q Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013). Demo: Deep Q learning Demo
  • 140. 140 OTS: Reinforcement learning Caicedo, Juan C., and Svetlana Lazebnik. "Active object localization with deep reinforcement learning." ICCV 2015 [Slides by Miriam Bellver] Other: Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013). Object is localized based on visual features from AlexNet FC6.
  • 141. ConvNets: Learn more 141 The New York Times: “The Race Is On to Control Artificial Intelligence, and Tech’s Future” (25/03/2016)
  • 142. 142 Keras http://keras.io/ Caffe http://caffe.berkeleyvision.org/ Torch (Overfeat) http://torch.ch/ Theano http://deeplearning.net/software/theano/ Tensor Flow https://www.tensorflow.org/ MatconvNet (VLFeat) http://www.vlfeat.org/matconvnet/ CNTK (Mcrosoft) http://www.cntk.ai/ ConvNets: Frameworks
  • 144. Stanford course: CS231n: Convolutional Neural Networks for Visual Recognition 144 ConvNets: Learn more
  • 145. 145 ● Reading Group on Tuesdays (UPC) and Wednesdays (UB) at 11am. Schedule, list of paper, slides & video(s): https://github.com/imatge-upc/readcv ConvNets: Learn more Open to MCV students. E-mail me at xavier. giro@upc.edu before attending :)
  • 146. 146 ● Summer Course “Deep learning for computer vision” (4-8 July 2016) (Very) temptative program at: https://github.com/imatge-upc/etsetb-2016-dlcv ConvNets: Learn more E-mail me at xavier. giro@upc.edu for a free spot (subject to availability)
  • 147. 147 ● Broader and more friendly slides for dissemination. [Available on slideshare.net] ConvNets: Learn more