Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
@DocXavi
Module 3 - Lecture 10
Deep Convnets for
Video Processing
28 January 2016
Xavier Giró-i-Nieto
[http://pagines.uab....
Acknowledgments
2
Linked slides
Motivation
Motivation
[Website]
Outline
1. Recognition
2. Optical Flow
3. Object Tracking
4. Learn more
6
Recognition
Demo: Clarifai
MIT Technology Review : “A start-up’s Neural Network Can Understand Video” (3/2/2015)
7
Figure: Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video c...
9
Recognition
Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotempor...
10
Recognition
Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotempo...
11
Recognition
Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotempo...
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video
classific...
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classific...
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classific...
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classific...
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classific...
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classific...
18
Recognition
Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotempo...
19
Recognition: C3D
Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning
spatio...
20
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D...
21
K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015.
Recognition: C...
22
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D...
23
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D...
24
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D...
25
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D...
26
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D...
27
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D...
28
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D...
29
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D...
30
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D...
31
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D...
32
Recognition: ImageNet Video
[ILSVRC 2015 Slides and videos]
33
Recognition: ImageNet Video
[ILSVRC 2015 Slides and videos]
34
Recognition: ImageNet Video
[ILSVRC 2015 Slides and videos]
35
Recognition: ImageNet Video
[ILSVRC 2015 Slides and videos]
36
Recognition: ImageNet Video
Kai Kang et al, Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 201...
37
Recognition: ImageNet Video
Kai Kang et al, Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 201...
38
Recognition: ImageNet Video
Kai Kang et al, Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 201...
39
Recognition: ImageNet Video
Kai Kang et al, Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 201...
Optical Flow
Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optic...
Optical Flow: Small vs Large
Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large di...
Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with ...
Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with ...
Source: Matlab R2015b documentation for normxcorr2 by Mathworks
44
Optical Flow: 2D correlation
Image
Sub-Image
Offset of ...
Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with ...
Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with ...
Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with ...
Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with ...
Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with ...
Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with ...
Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with ...
Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with ...
Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with ...
Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with ...
Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with ...
Optical Flow
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. ...
Optical Flow: FlowNet
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cre...
Optical Flow: FlowNet
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cre...
Optical Flow: FlowNet (contracting)
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der ...
Optical Flow: FlowNet (contracting)
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der ...
Optical Flow: FlowNet (contracting)
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der ...
Optical Flow: FlowNet (expanding)
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Sm...
Optical Flow: FlowNet
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cre...
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T.,...
Object tracking: MDNet
65
Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual...
Object tracking: MDNet
66
Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual...
Object tracking: MDNet: Architecture
67
Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural netwo...
Object tracking: MDNet: Online update
68
Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural netw...
Object tracking: FCNT
69
Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolution...
Object tracking: FCNT
70
Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolution...
Object tracking: FCNT: Specialization
71
Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with F...
Object tracking: FCNT: Localization
72
Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Ful...
Object tracking: Localization
73
Zhou, Bolei, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. "Object de...
Object tracking: FCNT: Localization
74
Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Ful...
Object tracking: FCNT: Architecture
75
Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Ful...
Object tracking: FCNT: Results
76
Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Co...
ConvNets: Software
Caffe http://caffe.berkeleyvision.org/
Torch (Overfeat) http://torch.ch/
Theano http://deeplearning.net...
Seminar Series:
Compacting ConvNets
for End to End Learning
Tuesday February 2, 4pm
D5-010 Campus Nord
ConvNets: Learn mor...
Stanford course:
CS231n:
Convolutional Neural
Networks for Visual
Recognition
ConvNets: Learn more
79
ConvNets: Learn more
Online course:
Deep Learning
Taking machine
learning to the next
level
80
ReadCV seminar
Friendly reviews of SoA papers
Spring 2016:
Tuesdays at 11am
ConvNets: Learn more
81
Barcelona
Convolucionada:
Deep Learning a l’abast
de tothom
Monday, February 1, 7pm @ FIB,
Campus Nord UPC
ConvNets: Learn...
Summer course
Deep Learning for
Computer Vision
(2.5 ECTS for MSc & Phd)
July 4-8, 3-7pm
ConvNets: Learn more
83
● Deep learning methos for vision (CVPR 2012)
● Tutorial on deep learning for vision (CVPR 2014)
● Kyunghyun Cho, “Deep Le...
ConvNets: Learn more
85
“Machine learning” sub-Reddit.
ConvNets: Learn more
86
ConvNets: Learn more
87
Check profile requirements for Summer internship (disclaimer: offered to Phd students by default)
...
ConvNets: Learn more
88
Video: Cristian Canton’s talk “From Catalonia to America: notes on how to achieve a successful pos...
Li Fei-Fei, “How we’re teaching
computers to understand pictures”
TEDTalks 2014.
ConvNets: Learn more
89
Jeremy Howard, “The wonderful
and terrifying implications of
computers that can learn”,
TEDTalks 2014.
ConvNets: Learn mor...
ConvNets: Learn more
91
● Neil Lawrence, OpenAI won’t benefit humanity without open data sharing
(The Guardian, 14/12/2015)
Is Computer
Vision solved ?
ConvNets: Discussion
92
Sports: Do you know them ?
93
ConvNets: Do you know them ?
94
Antonio Torralba, MIT
(former UPC)
...and MANY MORE I am missing in the page (apologies).
...
95
ConvNets: Where you are studying
VisioCat dinner
@ CVPR 2015
Considering a Phd at GPI-UPC ?
Currently, no direct funding available (check in the future).
We can support your applicati...
Image Classification
97
Our past research
A. Salvador, Zeppelzauer, M., Manchon-Vizuete, D., Calafell-Orós, A., and Giró-i...
Saliency Prediction
J. Pan and Giró-i-Nieto, X., “End-to-end Convolutional Network for Saliency Prediction”, in Large-scal...
Sentiment Analysis
99
Our current research
[Slides]
CNN
V. Campos, Salvador, A., Jou, B., and Giró-i-Nieto, X., “Diving De...
Our current research
Instance Search in Video
100
V. - T. Nguyen, -Dinh-Le, D., Salvador, A., -Zhu, C., Nguyen, D. - L., T...
Thank you !
Slides available on and .
https://imatge.upc.edu/web/people/xavier-giro
http://bitsearch.blogspot.com
https://...
Prochain SlideShare
Chargement dans…5
×

Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

3 495 vues

Publié le

Session 10 in module 3 from the Master in Computer Vision by UPC, UAB, UOC & UPF.
This lecture provides an overview of state of the art applications of convolutional neural networks to the problems in video processing: semantic recognition, optical flow estimation and object tracking.

Publié dans : Technologie
  • Identifiez-vous pour voir les commentaires

Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

  1. 1. @DocXavi Module 3 - Lecture 10 Deep Convnets for Video Processing 28 January 2016 Xavier Giró-i-Nieto [http://pagines.uab.cat/mcv/]
  2. 2. Acknowledgments 2
  3. 3. Linked slides
  4. 4. Motivation
  5. 5. Motivation [Website]
  6. 6. Outline 1. Recognition 2. Optical Flow 3. Object Tracking 4. Learn more 6
  7. 7. Recognition Demo: Clarifai MIT Technology Review : “A start-up’s Neural Network Can Understand Video” (3/2/2015) 7
  8. 8. Figure: Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. 8 Recognition
  9. 9. 9 Recognition Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
  10. 10. 10 Recognition Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 Previous lectures with Jose M. Álvarez
  11. 11. 11 Recognition Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
  12. 12. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. Slides extracted from ReadCV seminar by Victor Campos 12 Recognition: DeepVideo
  13. 13. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. 13 Recognition: DeepVideo: Demo
  14. 14. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. 14 Recognition: DeepVideo: Architectures
  15. 15. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. 15 Unsupervised learning [Le at al’11] Supervised learning [Karpathy et al’14] Recognition: DeepVideo: Features
  16. 16. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. 16 Recognition: DeepVideo: Multiscale
  17. 17. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. 17 Recognition: DeepVideo: Results
  18. 18. 18 Recognition Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
  19. 19. 19 Recognition: C3D Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
  20. 20. 20 Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 Recognition: C3D: Demo
  21. 21. 21 K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015. Recognition: C3D: Spatial dimension Spatial dimensions (XY) of the used kernels are fixed to 3x3, following Symonian & Zisserman (ICLR 2015).
  22. 22. 22 Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 Recognition: C3D: Temporal dimension 3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets Temporal depth 2D ConvNets
  23. 23. 23 Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 A homogeneous architecture with small 3 × 3 × 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets Recognition: C3D: Temporal dimension
  24. 24. 24 Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 No gain when varying the temporal depth across layers. Recognition: C3D: Temporal dimension
  25. 25. 25 Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 No gain when varying the temporal depth across layers. Recognition: C3D: Architecture Feature vector
  26. 26. 26 Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 Recognition: C3D: Feature vector Video sequence 16 frames-long clips 8 frames-long overlap
  27. 27. 27 Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 Recognition: C3D: Feature vector 16-frame clip 16-frame clip 16-frame clip 16-frame clip ... Average 4096-dimvideodescriptor 4096-dimvideodescriptor L2 norm
  28. 28. 28 Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 Recognition: C3D: Visualization Based on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details.
  29. 29. 29 Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 Recognition: C3D: Compactness
  30. 30. 30 Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks Recognition: C3D: Performance
  31. 31. 31 Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 Recognition: C3D: Software Implementation by Michael Gygli (GitHub)
  32. 32. 32 Recognition: ImageNet Video [ILSVRC 2015 Slides and videos]
  33. 33. 33 Recognition: ImageNet Video [ILSVRC 2015 Slides and videos]
  34. 34. 34 Recognition: ImageNet Video [ILSVRC 2015 Slides and videos]
  35. 35. 35 Recognition: ImageNet Video [ILSVRC 2015 Slides and videos]
  36. 36. 36 Recognition: ImageNet Video Kai Kang et al, Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]
  37. 37. 37 Recognition: ImageNet Video Kai Kang et al, Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]
  38. 38. 38 Recognition: ImageNet Video Kai Kang et al, Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]
  39. 39. 39 Recognition: ImageNet Video Kai Kang et al, Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]
  40. 40. Optical Flow Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with deep matching. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 1385-1392). IEEE 40
  41. 41. Optical Flow: Small vs Large Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with deep matching. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 1385-1392). IEEE 41
  42. 42. Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with deep matching. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 1385-1392). IEEE 42 Optical Flow Classic approach: Rigid matching of HoG or SIFT descriptors Deep Matching: Allow each subpatch to move: ● independently ● in a limited range depending on its size
  43. 43. Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with deep matching. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 1385-1392). IEEE 43 Optical Flow: Deep Matching
  44. 44. Source: Matlab R2015b documentation for normxcorr2 by Mathworks 44 Optical Flow: 2D correlation Image Sub-Image Offset of the sub-image with respect to the image [0,0].
  45. 45. Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with deep matching. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 1385-1392). IEEE 45 Instead of pre-trained filters, a convolution is defined between each: ● patch of the reference image ● target image ...as a results, a correlation map is generated for each reference patch. Optical Flow: Deep Matching
  46. 46. Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with deep matching. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 1385-1392). IEEE 46 Optical Flow: Deep Matching The most discriminative response map The less discriminative response map
  47. 47. Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with deep matching. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 1385-1392). IEEE 47 Key idea: Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search. Optical Flow: Deep Matching 4x4 patches 8x8 patches 16x16 patches 32x32 patches Top-down matching (TD)Bottom-up extraction (BU)
  48. 48. Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with deep matching. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 1385-1392). IEEE 48 Key idea: Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search. Optical Flow: Deep Matching 4x4 patches 8x8 patches 16x16 patches 32x32 patches Bottom-up extraction (BU)
  49. 49. Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with deep matching. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 1385-1392). IEEE 49 Optical Flow: Deep Matching (BU)
  50. 50. Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with deep matching. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 1385-1392). IEEE 50 Key idea: Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search. Optical Flow: Deep Matching (TD) 4x4 patches 8x8 patches 16x16 patches 32x32 patches Top-down matching (TD)
  51. 51. Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with deep matching. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 1385-1392). IEEE 51 Optical Flow: Deep Matching (TD) Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patches. If we focus on local maximum, we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it
  52. 52. Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with deep matching. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 1385-1392). IEEE 52 Optical Flow: Deep Matching (TD)
  53. 53. Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with deep matching. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 1385-1392). IEEE 53 Optical Flow: Deep Matching
  54. 54. Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with deep matching. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 1385-1392). IEEE 54 Ground truth Dense HOG [Brox & Malik 2011] Deep Matching Optical Flow: Deep Matching
  55. 55. Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013, December). DeepFlow: Large displacement optical flow with deep matching. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 1385-1392). IEEE 55 Optical Flow: Deep Matching
  56. 56. Optical Flow Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 56
  57. 57. Optical Flow: FlowNet Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 57
  58. 58. Optical Flow: FlowNet Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 58 End to end supervised learning of optical flow.
  59. 59. Optical Flow: FlowNet (contracting) Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 59 Option A: Stack both input images together and feed them through a generic network.
  60. 60. Optical Flow: FlowNet (contracting) Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 60 Option B: Create two separate, yet identical processing streams for the two images and combine them at a later stage.
  61. 61. Optical Flow: FlowNet (contracting) Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 61 Option B: Create two separate, yet identical processing streams for the two images and combine them at a later stage. Correlation layer: Convolution of data patches from the layers to combine.
  62. 62. Optical Flow: FlowNet (expanding) Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 62 Upconvolutional layers: Unpooling features maps + convolution. Upconvolutioned feature maps are concatenated with the corresponding map from the contractive part.
  63. 63. Optical Flow: FlowNet Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 63 Since existing ground truth datasets are not sufficiently large to train a Convnet, a synthetic Flying Dataset is generated… and augmented (translation, rotation, scaling transformations; additive Gaussian noise; changes in brightness, contrast, gamma and color). Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI. Data augmentation
  64. 64. Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 64 Optical Flow: FlowNet
  65. 65. Object tracking: MDNet 65 Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual tracking." ICCV VOT Workshop (2015)
  66. 66. Object tracking: MDNet 66 Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual tracking." ICCV VOT Workshop (2015)
  67. 67. Object tracking: MDNet: Architecture 67 Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual tracking." ICCV VOT Workshop (2015) Domain-specific layers are used during training for each sequence, but are replaced by a single one at test time.
  68. 68. Object tracking: MDNet: Online update 68 Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual tracking." ICCV VOT Workshop (2015) MDNet is updated online at test time with hard negative mining, that is, selecting negative samples with the highest positive score.
  69. 69. Object tracking: FCNT 69 Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code]
  70. 70. Object tracking: FCNT 70 Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code] Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification. conv4-3 conv5-3
  71. 71. Object tracking: FCNT: Specialization 71 Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code] Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence.
  72. 72. Object tracking: FCNT: Localization 72 Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code] Although trained for image classification, feature maps in conv5-3 enable object localization… ...but is not discriminative enough to different objects of the same category.
  73. 73. Object tracking: Localization 73 Zhou, Bolei, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. "Object detectors emerge in deep scene cnns." ICLR 2015. [Zhou et al, ICLR 2015] “Object detectors emerge in deep scene CNNs” [Slides from ReadCV]
  74. 74. Object tracking: FCNT: Localization 74 Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code] On the other hand, feature maps from conv4-3 are more sensitive to intra-class appearance variation… conv4-3 conv5-3
  75. 75. Object tracking: FCNT: Architecture 75 Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code] SNet=Specific Network (online update) GNet=General Network (fixed)
  76. 76. Object tracking: FCNT: Results 76 Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code]
  77. 77. ConvNets: Software Caffe http://caffe.berkeleyvision.org/ Torch (Overfeat) http://torch.ch/ Theano http://deeplearning.net/software/theano/ Tensor Flow https://www.tensorflow.org/ MatconvNet (VLFeat) http://www.vlfeat.org/matconvnet/ CNTK (Mcrosoft) http://www.cntk.ai/ 77
  78. 78. Seminar Series: Compacting ConvNets for End to End Learning Tuesday February 2, 4pm D5-010 Campus Nord ConvNets: Learn more 78 Jose M Álvarez
  79. 79. Stanford course: CS231n: Convolutional Neural Networks for Visual Recognition ConvNets: Learn more 79
  80. 80. ConvNets: Learn more Online course: Deep Learning Taking machine learning to the next level 80
  81. 81. ReadCV seminar Friendly reviews of SoA papers Spring 2016: Tuesdays at 11am ConvNets: Learn more 81
  82. 82. Barcelona Convolucionada: Deep Learning a l’abast de tothom Monday, February 1, 7pm @ FIB, Campus Nord UPC ConvNets: Learn more 82 Grup d’estudi de machine learning Barcelona
  83. 83. Summer course Deep Learning for Computer Vision (2.5 ECTS for MSc & Phd) July 4-8, 3-7pm ConvNets: Learn more 83
  84. 84. ● Deep learning methos for vision (CVPR 2012) ● Tutorial on deep learning for vision (CVPR 2014) ● Kyunghyun Cho, “Deep Learning: Past, Present & Future” ConvNets: Learn more 84
  85. 85. ConvNets: Learn more 85 “Machine learning” sub-Reddit.
  86. 86. ConvNets: Learn more 86
  87. 87. ConvNets: Learn more 87 Check profile requirements for Summer internship (disclaimer: offered to Phd students by default) Company Avg Salary / hour Avg Salary / month Yahoo $43 ($43x160=$6,880) Apple $37 ($37x160=$5,920) Google $29.54-$31.32 $7,151 Facebook $22.92 $6,150-$7,378 Microsoft $22.63 $6,506-$7,171 Source: Glassdoor.com (internships in California. No stipends included)
  88. 88. ConvNets: Learn more 88 Video: Cristian Canton’s talk “From Catalonia to America: notes on how to achieve a successful post-Phd career ”@ ACMCV 2015 & UPC
  89. 89. Li Fei-Fei, “How we’re teaching computers to understand pictures” TEDTalks 2014. ConvNets: Learn more 89
  90. 90. Jeremy Howard, “The wonderful and terrifying implications of computers that can learn”, TEDTalks 2014. ConvNets: Learn more 90
  91. 91. ConvNets: Learn more 91 ● Neil Lawrence, OpenAI won’t benefit humanity without open data sharing (The Guardian, 14/12/2015)
  92. 92. Is Computer Vision solved ? ConvNets: Discussion 92
  93. 93. Sports: Do you know them ? 93
  94. 94. ConvNets: Do you know them ? 94 Antonio Torralba, MIT (former UPC) ...and MANY MORE I am missing in the page (apologies). Oriol Vinyals, Google (former UPC) Jose M Álvarez, NICTA (former URL & UAB) Joan Bruna, Berkeley (former UPC)
  95. 95. 95 ConvNets: Where you are studying VisioCat dinner @ CVPR 2015
  96. 96. Considering a Phd at GPI-UPC ? Currently, no direct funding available (check in the future). We can support your application to scholarships: External grant listings: UPC, UPF Funding institution Last deadlines (on 28/1/2016) FI (Catalonia) 22/09/2015 FPU (Spain) 15/01/2016 Check our activity at https://imatge.upc.edu/web/ 96
  97. 97. Image Classification 97 Our past research A. Salvador, Zeppelzauer, M., Manchon-Vizuete, D., Calafell-Orós, A., and Giró-i-Nieto, X., “Cultural Event Recognition with Visual ConvNets and Temporal Models”, in CVPR ChaLearn Looking at People Workshop 2015, 2015. [slides] ChaLearn Worshop
  98. 98. Saliency Prediction J. Pan and Giró-i-Nieto, X., “End-to-end Convolutional Network for Saliency Prediction”, in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops , Boston, MA (USA), 2015. [Slides] 98 Our current research LSUN Challenge
  99. 99. Sentiment Analysis 99 Our current research [Slides] CNN V. Campos, Salvador, A., Jou, B., and Giró-i-Nieto, X., “Diving Deep into Sentiment: Understanding Fine-tuned CNNs for Visual Sentiment Prediction”, in 1st International Workshop on Affect and Sentiment in Multimedia, Brisbane, Australia, 2015.
  100. 100. Our current research Instance Search in Video 100 V. - T. Nguyen, -Dinh-Le, D., Salvador, A., -Zhu, C., Nguyen, D. - L., Tran, M. - T., Duc, T. Ngo, Duong, D. Anh, Satoh, S. 'ichi, and Giró-i-Nieto, X., “NII-HITACHI- UIT at TRECVID 2015 Instance Search”, in TRECVID 2015 Workshop, Gaithersburg, MD, USA, 2015. K. McGuinness, Mohedano, E., Salvador, A., Zhang, Z. X., Marsden, M., Wang, P., Jargalsaikhan, I., Antony, J., Giró-i-Nieto, X., Satoh, S. 'ichi, O'Connor, N., and Smeaton, A. F., “Insight DCU at TRECVID 2015”, in TRECVID 2015 Workshop, Gaithersburg, MD, USA, 2015. ...
  101. 101. Thank you ! Slides available on and . https://imatge.upc.edu/web/people/xavier-giro http://bitsearch.blogspot.com https://twitter.com/DocXavi https://www.facebook.com/ProfessorXavi xavier.giro@upc.edu 101

×