SlideShare une entreprise Scribd logo
1  sur  53
Télécharger pour lire hors ligne
@DocXavi
Xavier Giró-i-Nieto
[http://pagines.uab.cat/mcv/]
Module 6
Deep Learning for Video:
Architectures
26th February 2019
Our team
Xavier Giro-i-Nieto
• Web: https://imatge.upc.edu/web/people/xavier-giro
Associate Professor at Universitat Politecnica de Catalunya (UPC)
IDEAI Center for
Intelligent Data Science
& Artificial Intelligence
Our team & Our Deep Research
DL online courses by UPC TelecomBCN:
You can audit this one.
● MSc course [2017] [2018]
● BSc course [2018] [2019]
● 1st edition (2016)
● 2nd edition (2017)
● 3rd edition (2018)
● 4th edition (2019)
● 1st edition (2017)
● 2nd edition (2018)
● 3rd edition - NLP (2019)
Acknowledgements
5
Víctor Campos Alberto MontesAmaia Salvador Santiago Pascual
Video lecture
6
Deep Learning for Computer Vision (UPC 2018)
Víctor Campos
victor.campos@bsc.es
PhD Candidate
Barcelona Supercomputing Center
7
Densely linked slides
Outline
8
1. Architectures
2. Tips and tricks
9
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. . Large-scale video
classification with convolutional neural networks. CVPR 2014
Motivation
10
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. Large-scale video classification with
convolutional neural networks. CVPR 2014.
What is a video?
11
● Formally, a video is a 3D signal
○ Spatial coordinates: x, y
○ Temporal coordinate: t
● If we fix t, we obtain an image. We can understand
videos as sequences of images (a.k.a. frames)
Convolutional Neural Networks (CNN) provide state of the
art performance on image analysis tasks
How do we work with images?
12
How can we extend CNNs to image sequences?
How do we work with videos ?
13
f ŷ
14
Basic deep architectures for video:
1. Single frame models
2. CNN + RNN
3. 3D convolutions
4. Two-stream CNN
Deep Video Architectures
15
Basic deep architectures for video:
1. Single frame models
2. CNN + RNN
3. 3D convolutions
4. Two-stream CNN
Deep Video Architectures
Single frame models
16
CNN CNN CNN...
Combination method
Combination is commonly
implemented as a small NN on
top of a pooling operation
(e.g. max, sum, average).
Problem: pooling is not aware
of the temporal order!
Yue-Hei Ng, Joe, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and
George Toderici. "Beyond short snippets: Deep networks for video classification." CVPR 2015
ŷ
17
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. . Large-scale video
classification with convolutional neural networks. CVPR 2014
Single frame models
18
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. . Large-scale video
classification with convolutional neural networks. CVPR 2014
Single frame models
19
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. . Large-scale video
classification with convolutional neural networks. CVPR 2014
Single frame models
20
João Carreira, Viorica Pătrăucean, Laurent Mazare, Andrew Zisserman, Simon Osindero, “Massively Parallel Video
Networks” ECCV 2018 [video]
Single frame models: Temporal redundancy
21
Basic deep architectures for video:
1. Single frame models
2. CNN + RNN
3. 3D convolutions
4. Two-stream CNN
Deep Video Architectures
22Slide credit: Santi Pascual
If we have a sequence of samples...
predict sample x[t+1] knowing previous values {x[t], x[t-1], x[t-2], …, x[t-τ]}
Limitation of Feed Forward NN (as CNNs)
23Slide credit: Santi Pascual
Feed Forward approach:
● static window of size L
● slide the window time-step wise
...
...
...
x[t+1]
x[t-L], …, x[t-1], x[t]
x[t+1]
L
Limitation of Feed Forward NN (as CNNs)
24Slide credit: Santi Pascual
Feed Forward approach:
● static window of size L
● slide the window time-step wise
...
...
...
x[t+2]
x[t-L+1], …, x[t], x[t+1]
...
...
...
x[t+1]
x[t-L], …, x[t-1], x[t]
x[t+2]
L
Limitation of Feed Forward NN (as CNNs)
25Slide credit: Santi Pascual 25
Feed Forward approach:
● static window of size L
● slide the window time-step wise
x[t+3]
L
...
...
...
x[t+3]
x[t-L+2], …, x[t+1], x[t+2]
...
...
...
x[t+2]
x[t-L+1], …, x[t], x[t+1]
...
...
...
x[t+1]
x[t-L], …, x[t-1], x[t]
Limitation of Feed Forward NN (as CNNs)
...
...
...
x1, x2, …, xL
Problems for the feed forward + static window approach:
● What’s the matter increasing L? → Fast growth of num of parameters!
● Decisions are independent between time-steps!
○ The network doesn’t care about what happened at previous time-step, only present window
matters → doesn’t look good
● Cumbersome padding when there are not enough samples to fill L size
○ Can’t work with variable sequence lengths
x1, x2, …, xL, …, x2L
...
...
x1, x2, …, xL, …, x2L, …, x3L
...
...
... ...
Slide credit: Santi Pascual
Limitation of Feed Forward NN (as CNNs)
27
The hidden layers and the
output depend from previous
states of the hidden layers
Recurrent Neural Network (RNN)
2D CNN + RNN
28
CNN CNN CNN...
Recurrent Neural Networks
are well suited for processing
sequences.
Problem: RNNs are sequential
and cannot be parallelized.
RNN RNN RNN...
Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko,
Trevor Darrel. Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR 2015. code
Videolectures on RNNs:
DLSL 2017, “RNN (I)”
“RNN (II)”
DLAI 2018, “RNN”
29
Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko,
Trevor Darrel. Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR 2015. code
2D CNN + RNN
30
Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip RNN: Learning to
Skip State Updates in Recurrent Neural Networks”, ICLR 2018.
2D CNN + RNN: redundancy
CNN CNN CNN...
RNN RNN RNN... After processing a frame, let
the RNN decide how many
future frames can be skipped
In skipped frames, simply
copy the output and state
from the previous time step
There is no ground truth for
which frames can be skipped.
The RNN learns it by itself
during training!
31
Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip RNN: Learning to
Skip State Updates in Recurrent Neural Networks”, ICLR 2018.
Used
Unused
2D CNN + RNN: redundancy
32
Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip RNN: Learning to
Skip State Updates in Recurrent Neural Networks”, ICLR 2018.
2D CNN + RNN
Epochs for Skip LSTM (λ = 10-4
)
~30% acc ~50% acc ~70% acc ~95% acc
11 epochs 51 epochs 101 epochs 400 epochs
Used
Unused
33
Basic deep architectures for video:
1. Single frame models
2. CNN + RNN
3. 3D convolutions
4. Two-stream CNN
Deep Video Architectures
3D CNN (C3D)
34
We can add an extra dimension to standard CNNs:
● An image is a HxWxD tensor: MxNxD’ conv filters
● A video is a TxHxWxD tensor: KxMxNxD’ conv filters
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning
spatiotemporal features with 3D convolutional networks" ICCV 2015
35
The video needs to be split into chunks (also known as clips) with a number of
frames that fits the receptive field of the C3D. Usually clips have 16 frames.
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning
spatiotemporal features with 3D convolutional networks" ICCV 2015
16-frame clip
16-frame clip
16-frame clip
...
Average
4096-dimvideodescriptor
4096-dimvideodescriptor
L2 norm
3D CNN (C3D)
36
3D convolutions + Residual connections
Tran, Du, Jamie Ray, Zheng Shou, Shih-Fu Chang, and Manohar Paluri. "Convnet architecture search
for spatiotemporal feature learning." arXiv preprint arXiv:1708.05038 (2017). [code]
3D CNN (Res3D)
37
Limitation:
● How can we use pre-trained networks?
3D CNN
38
Re-use pre-trained 2D CNNs for 3D CNNs
We can add an extra dimension to standard CNNs:
● An image is a HxWxD tensor: MxNxD’ conv filters
● A video is a TxHxWxD tensor: KxMxNxD’ conv filters
We can convert an MxNxD’ conv filter into a KxMxNxD’ filter by replicating it K
times in the time axis and scaling its values by 1/K.
● This allows to leverage networks pre-trained on ImageNet and alleviate
the computational burden associated to training from scratch
Feichtenhofer et al., Spatiotemporal Residual Networks for Video Action Recognition, NIPS 2016
Carreira et al., Quo vadis, action recognition? A new model and the kinetics dataset, CVPR 2017
39
Limitations:
● How can we handle longer videos?
● How can we capture longer temporal dependencies?
3D CNN
40
3D CNN + RNN
A. Montes, Salvador, A., Pascual-deLaPuente, S., and Giró-i-Nieto, X., “Temporal Activity Detection in
Untrimmed Videos with Recurrent Neural Networks”, NIPS Workshop 2016 (best poster award)
41
Basic deep architectures for video:
1. Single frame models
2. CNN + RNN
3. 3D convolutions
4. Two-stream CNN
Deep Video Architectures
Two-streams 2D CNNs
42
Problem: Single frame models do not take into account motion in videos.
Solution: extract optical flow for a stack of frames and use it as an input to a
CNN.
Simonyan, Karen, and Andrew Zisserman. "Two-stream convolutional networks for action recognition in
videos." NIPS 2014.
Fusion
Two-streams 2D CNNs
43Feichtenhofer, Christoph, Axel Pinz, and Andrew Zisserman. "Convolutional two-stream network fusion for video action
recognition." CVPR 2016. [code]
Fusion
Problem: Single frame models do not take into account motion in videos.
Solution: extract optical flow for a stack of frames and use it as an input to a
CNN.
Two-streams 2D CNNs
44
Feichtenhofer, Christoph, Axel Pinz, and Richard Wildes. "Spatiotemporal residual networks for video action recognition."
NIPS 2016. [code]
Problem: Single frame models do not take into account motion in videos.
Solution: extract optical flow for a stack of frames and use it as an input to a
CNN.
Two-streams 3D CNNs
45
Carreira, J., & Zisserman, A. . Quo vadis, action recognition? A new model and the kinetics dataset. CVPR
2017. [code]
Two-streams 3D CNNs
46
Carreira, J., & Zisserman, A. . Quo vadis, action recognition? A new model and the kinetics dataset. CVPR
2017. [code]
Two-streams 2D CNNs + RNN
47
Yue-Hei Ng, Joe, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici.
"Beyond short snippets: Deep networks for video classification." CVPR 2015
Outline
48
1. Architectures
2. Tips and tricks
Large-scale datasets
49
● The reference dataset for image classification, ImageNet, has ~1.3M
images
○ Training a state of the art CNN can take up to 2 weeks on a
single GPU
● Now imagine that we have an ‘ImageNet’ of 1.3M videos
○ Assuming videos of 30s at 24fps, we have 936M frames
○ This is 720x ImageNet!
● Videos exhibit a large redundancy in time
○ We can reduce the frame rate without losing too much
information
Tips & Tricks by Víctor Campos (2017)
Memory issues
50
● Current GPUs can fit batches of 32~64 images when training state of
the art CNNs
○ This means 32~64 video frames at once
● Memory footprint can be reduced in different ways if a pre-trained
CNN model is used
○ Freezing some of the lower layers, reducing the memory impact
of backprop
○ Extracting frame-level features and training a model on top of it
(e.g. RNN on top of CNN features). This is equivalent to freezing
the whole architecture, but the CNN part needs to be computed
only once.
Tips & Tricks by Víctor Campos (2017)
I/O bottleneck
51
● In practice, applying deep learning to video analysis requires from
multi-GPU or distributed settings
● In such settings it is very important to avoid starving the GPUs or we
will not obtain any speedup
○ The next batch needs to be loaded and preprocessed to keep
the GPU as busy as possible
○ Using asynchronous data loading pipelines is a key factor
○ Loading individual files is slow due to the introduced overhead,
so using other formats such as TFRecord/HDF5/LMDB is highly
recommended
Tips & Tricks by Víctor Campos (2017)
52
Learn more
Rohit Ghosh, “Deep Learning for Videos: A 2018 Guide for Action recognition” (2018)
53
Questions ?

Contenu connexe

Tendances

Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...Universitat Politècnica de Catalunya
 
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019Universitat Politècnica de Catalunya
 
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...Universitat Politècnica de Catalunya
 
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaSelf-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaUniversitat Politècnica de Catalunya
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Universitat Politècnica de Catalunya
 
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 

Tendances (20)

Deep Learning from Videos (UPC 2018)
Deep Learning from Videos (UPC 2018)Deep Learning from Videos (UPC 2018)
Deep Learning from Videos (UPC 2018)
 
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...
 
Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)
 
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
 
Multimodal Deep Learning
Multimodal Deep LearningMultimodal Deep Learning
Multimodal Deep Learning
 
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
 
Deep Learning Representations for All (a.ka. the AI hype)
Deep Learning Representations for All (a.ka. the AI hype)Deep Learning Representations for All (a.ka. the AI hype)
Deep Learning Representations for All (a.ka. the AI hype)
 
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaSelf-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
 
One Perceptron to Rule Them All: Language and Vision
One Perceptron to Rule Them All: Language and VisionOne Perceptron to Rule Them All: Language and Vision
One Perceptron to Rule Them All: Language and Vision
 
Deep Learning for Video: Object Tracking (UPC 2018)
Deep Learning for Video: Object Tracking (UPC 2018)Deep Learning for Video: Object Tracking (UPC 2018)
Deep Learning for Video: Object Tracking (UPC 2018)
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
 
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 
Skipping and Repeating Samples in Recurrent Neural Networks
Skipping and Repeating Samples in Recurrent Neural NetworksSkipping and Repeating Samples in Recurrent Neural Networks
Skipping and Repeating Samples in Recurrent Neural Networks
 
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 

Similaire à Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019

Reservoir computing fast deep learning for sequences
Reservoir computing   fast deep learning for sequencesReservoir computing   fast deep learning for sequences
Reservoir computing fast deep learning for sequencesClaudio Gallicchio
 
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Universitat Politècnica de Catalunya
 
Deep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal LearningDeep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal LearningMarc Bolaños Solà
 
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Luba Elliott
 
Machine Learning approaches at video compression
Machine Learning approaches at video compression Machine Learning approaches at video compression
Machine Learning approaches at video compression Roberto Iacoviello
 
Research and activity report
Research and activity reportResearch and activity report
Research and activity reportMarco Cagnazzo
 
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016Universitat Politècnica de Catalunya
 
Finding the best solution for Image Processing
Finding the best solution for Image ProcessingFinding the best solution for Image Processing
Finding the best solution for Image ProcessingTech Triveni
 
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"Dov Nimratz, Roman Chobik "Embedded artificial intelligence"
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"Lviv Startup Club
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in VisionSangmin Woo
 
U_N.o.1T: A U-Net exploration, in Depth
U_N.o.1T: A U-Net exploration, in DepthU_N.o.1T: A U-Net exploration, in Depth
U_N.o.1T: A U-Net exploration, in DepthManuel Nieves Sáez
 
Meetup 18/10/2018 - Artificiële intelligentie en mobiliteit
Meetup 18/10/2018 - Artificiële intelligentie en mobiliteitMeetup 18/10/2018 - Artificiële intelligentie en mobiliteit
Meetup 18/10/2018 - Artificiële intelligentie en mobiliteitDigipolis Antwerpen
 
Supervised Learning of Sparsity-Promoting Regularizers for Denoising
Supervised Learning of Sparsity-Promoting Regularizers for DenoisingSupervised Learning of Sparsity-Promoting Regularizers for Denoising
Supervised Learning of Sparsity-Promoting Regularizers for DenoisingMike McCann
 
A Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep LearningA Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep LearningIRJET Journal
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningBrodmann17
 
Video Description using Deep Learning
Video Description using Deep LearningVideo Description using Deep Learning
Video Description using Deep LearningPranjalMahajan9
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerPoo Kuan Hoong
 

Similaire à Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019 (20)

Reservoir computing fast deep learning for sequences
Reservoir computing   fast deep learning for sequencesReservoir computing   fast deep learning for sequences
Reservoir computing fast deep learning for sequences
 
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
 
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
 
Deep Learning for Computer Vision: Video Analytics (UPC 2016)
Deep Learning for Computer Vision: Video Analytics (UPC 2016)Deep Learning for Computer Vision: Video Analytics (UPC 2016)
Deep Learning for Computer Vision: Video Analytics (UPC 2016)
 
Deep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal LearningDeep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal Learning
 
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
 
Machine Learning approaches at video compression
Machine Learning approaches at video compression Machine Learning approaches at video compression
Machine Learning approaches at video compression
 
Research and activity report
Research and activity reportResearch and activity report
Research and activity report
 
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
 
Finding the best solution for Image Processing
Finding the best solution for Image ProcessingFinding the best solution for Image Processing
Finding the best solution for Image Processing
 
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"Dov Nimratz, Roman Chobik "Embedded artificial intelligence"
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in Vision
 
U_N.o.1T: A U-Net exploration, in Depth
U_N.o.1T: A U-Net exploration, in DepthU_N.o.1T: A U-Net exploration, in Depth
U_N.o.1T: A U-Net exploration, in Depth
 
Meetup 18/10/2018 - Artificiële intelligentie en mobiliteit
Meetup 18/10/2018 - Artificiële intelligentie en mobiliteitMeetup 18/10/2018 - Artificiële intelligentie en mobiliteit
Meetup 18/10/2018 - Artificiële intelligentie en mobiliteit
 
Supervised Learning of Sparsity-Promoting Regularizers for Denoising
Supervised Learning of Sparsity-Promoting Regularizers for DenoisingSupervised Learning of Sparsity-Promoting Regularizers for Denoising
Supervised Learning of Sparsity-Promoting Regularizers for Denoising
 
A Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep LearningA Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep Learning
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
 
Video Description using Deep Learning
Video Description using Deep LearningVideo Description using Deep Learning
Video Description using Deep Learning
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
 

Plus de Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Universitat Politècnica de Catalunya
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Universitat Politècnica de Catalunya
 
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic ModerationHate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic ModerationUniversitat Politècnica de Catalunya
 

Plus de Universitat Politècnica de Catalunya (19)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
 
Backpropagation for Deep Learning
Backpropagation for Deep LearningBackpropagation for Deep Learning
Backpropagation for Deep Learning
 
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic ModerationHate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
 

Dernier

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 

Dernier (20)

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 

Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019

  • 1. @DocXavi Xavier Giró-i-Nieto [http://pagines.uab.cat/mcv/] Module 6 Deep Learning for Video: Architectures 26th February 2019
  • 2. Our team Xavier Giro-i-Nieto • Web: https://imatge.upc.edu/web/people/xavier-giro Associate Professor at Universitat Politecnica de Catalunya (UPC) IDEAI Center for Intelligent Data Science & Artificial Intelligence
  • 3. Our team & Our Deep Research
  • 4. DL online courses by UPC TelecomBCN: You can audit this one. ● MSc course [2017] [2018] ● BSc course [2018] [2019] ● 1st edition (2016) ● 2nd edition (2017) ● 3rd edition (2018) ● 4th edition (2019) ● 1st edition (2017) ● 2nd edition (2018) ● 3rd edition - NLP (2019)
  • 5. Acknowledgements 5 Víctor Campos Alberto MontesAmaia Salvador Santiago Pascual
  • 6. Video lecture 6 Deep Learning for Computer Vision (UPC 2018) Víctor Campos victor.campos@bsc.es PhD Candidate Barcelona Supercomputing Center
  • 9. 9 Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. . Large-scale video classification with convolutional neural networks. CVPR 2014
  • 10. Motivation 10 Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. Large-scale video classification with convolutional neural networks. CVPR 2014.
  • 11. What is a video? 11 ● Formally, a video is a 3D signal ○ Spatial coordinates: x, y ○ Temporal coordinate: t ● If we fix t, we obtain an image. We can understand videos as sequences of images (a.k.a. frames)
  • 12. Convolutional Neural Networks (CNN) provide state of the art performance on image analysis tasks How do we work with images? 12
  • 13. How can we extend CNNs to image sequences? How do we work with videos ? 13 f ŷ
  • 14. 14 Basic deep architectures for video: 1. Single frame models 2. CNN + RNN 3. 3D convolutions 4. Two-stream CNN Deep Video Architectures
  • 15. 15 Basic deep architectures for video: 1. Single frame models 2. CNN + RNN 3. 3D convolutions 4. Two-stream CNN Deep Video Architectures
  • 16. Single frame models 16 CNN CNN CNN... Combination method Combination is commonly implemented as a small NN on top of a pooling operation (e.g. max, sum, average). Problem: pooling is not aware of the temporal order! Yue-Hei Ng, Joe, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. "Beyond short snippets: Deep networks for video classification." CVPR 2015 ŷ
  • 17. 17 Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. . Large-scale video classification with convolutional neural networks. CVPR 2014 Single frame models
  • 18. 18 Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. . Large-scale video classification with convolutional neural networks. CVPR 2014 Single frame models
  • 19. 19 Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. . Large-scale video classification with convolutional neural networks. CVPR 2014 Single frame models
  • 20. 20 João Carreira, Viorica Pătrăucean, Laurent Mazare, Andrew Zisserman, Simon Osindero, “Massively Parallel Video Networks” ECCV 2018 [video] Single frame models: Temporal redundancy
  • 21. 21 Basic deep architectures for video: 1. Single frame models 2. CNN + RNN 3. 3D convolutions 4. Two-stream CNN Deep Video Architectures
  • 22. 22Slide credit: Santi Pascual If we have a sequence of samples... predict sample x[t+1] knowing previous values {x[t], x[t-1], x[t-2], …, x[t-τ]} Limitation of Feed Forward NN (as CNNs)
  • 23. 23Slide credit: Santi Pascual Feed Forward approach: ● static window of size L ● slide the window time-step wise ... ... ... x[t+1] x[t-L], …, x[t-1], x[t] x[t+1] L Limitation of Feed Forward NN (as CNNs)
  • 24. 24Slide credit: Santi Pascual Feed Forward approach: ● static window of size L ● slide the window time-step wise ... ... ... x[t+2] x[t-L+1], …, x[t], x[t+1] ... ... ... x[t+1] x[t-L], …, x[t-1], x[t] x[t+2] L Limitation of Feed Forward NN (as CNNs)
  • 25. 25Slide credit: Santi Pascual 25 Feed Forward approach: ● static window of size L ● slide the window time-step wise x[t+3] L ... ... ... x[t+3] x[t-L+2], …, x[t+1], x[t+2] ... ... ... x[t+2] x[t-L+1], …, x[t], x[t+1] ... ... ... x[t+1] x[t-L], …, x[t-1], x[t] Limitation of Feed Forward NN (as CNNs)
  • 26. ... ... ... x1, x2, …, xL Problems for the feed forward + static window approach: ● What’s the matter increasing L? → Fast growth of num of parameters! ● Decisions are independent between time-steps! ○ The network doesn’t care about what happened at previous time-step, only present window matters → doesn’t look good ● Cumbersome padding when there are not enough samples to fill L size ○ Can’t work with variable sequence lengths x1, x2, …, xL, …, x2L ... ... x1, x2, …, xL, …, x2L, …, x3L ... ... ... ... Slide credit: Santi Pascual Limitation of Feed Forward NN (as CNNs)
  • 27. 27 The hidden layers and the output depend from previous states of the hidden layers Recurrent Neural Network (RNN)
  • 28. 2D CNN + RNN 28 CNN CNN CNN... Recurrent Neural Networks are well suited for processing sequences. Problem: RNNs are sequential and cannot be parallelized. RNN RNN RNN... Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrel. Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR 2015. code Videolectures on RNNs: DLSL 2017, “RNN (I)” “RNN (II)” DLAI 2018, “RNN”
  • 29. 29 Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrel. Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR 2015. code 2D CNN + RNN
  • 30. 30 Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks”, ICLR 2018. 2D CNN + RNN: redundancy CNN CNN CNN... RNN RNN RNN... After processing a frame, let the RNN decide how many future frames can be skipped In skipped frames, simply copy the output and state from the previous time step There is no ground truth for which frames can be skipped. The RNN learns it by itself during training!
  • 31. 31 Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks”, ICLR 2018. Used Unused 2D CNN + RNN: redundancy
  • 32. 32 Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks”, ICLR 2018. 2D CNN + RNN Epochs for Skip LSTM (λ = 10-4 ) ~30% acc ~50% acc ~70% acc ~95% acc 11 epochs 51 epochs 101 epochs 400 epochs Used Unused
  • 33. 33 Basic deep architectures for video: 1. Single frame models 2. CNN + RNN 3. 3D convolutions 4. Two-stream CNN Deep Video Architectures
  • 34. 3D CNN (C3D) 34 We can add an extra dimension to standard CNNs: ● An image is a HxWxD tensor: MxNxD’ conv filters ● A video is a TxHxWxD tensor: KxMxNxD’ conv filters Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks" ICCV 2015
  • 35. 35 The video needs to be split into chunks (also known as clips) with a number of frames that fits the receptive field of the C3D. Usually clips have 16 frames. Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks" ICCV 2015 16-frame clip 16-frame clip 16-frame clip ... Average 4096-dimvideodescriptor 4096-dimvideodescriptor L2 norm 3D CNN (C3D)
  • 36. 36 3D convolutions + Residual connections Tran, Du, Jamie Ray, Zheng Shou, Shih-Fu Chang, and Manohar Paluri. "Convnet architecture search for spatiotemporal feature learning." arXiv preprint arXiv:1708.05038 (2017). [code] 3D CNN (Res3D)
  • 37. 37 Limitation: ● How can we use pre-trained networks? 3D CNN
  • 38. 38 Re-use pre-trained 2D CNNs for 3D CNNs We can add an extra dimension to standard CNNs: ● An image is a HxWxD tensor: MxNxD’ conv filters ● A video is a TxHxWxD tensor: KxMxNxD’ conv filters We can convert an MxNxD’ conv filter into a KxMxNxD’ filter by replicating it K times in the time axis and scaling its values by 1/K. ● This allows to leverage networks pre-trained on ImageNet and alleviate the computational burden associated to training from scratch Feichtenhofer et al., Spatiotemporal Residual Networks for Video Action Recognition, NIPS 2016 Carreira et al., Quo vadis, action recognition? A new model and the kinetics dataset, CVPR 2017
  • 39. 39 Limitations: ● How can we handle longer videos? ● How can we capture longer temporal dependencies? 3D CNN
  • 40. 40 3D CNN + RNN A. Montes, Salvador, A., Pascual-deLaPuente, S., and Giró-i-Nieto, X., “Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks”, NIPS Workshop 2016 (best poster award)
  • 41. 41 Basic deep architectures for video: 1. Single frame models 2. CNN + RNN 3. 3D convolutions 4. Two-stream CNN Deep Video Architectures
  • 42. Two-streams 2D CNNs 42 Problem: Single frame models do not take into account motion in videos. Solution: extract optical flow for a stack of frames and use it as an input to a CNN. Simonyan, Karen, and Andrew Zisserman. "Two-stream convolutional networks for action recognition in videos." NIPS 2014. Fusion
  • 43. Two-streams 2D CNNs 43Feichtenhofer, Christoph, Axel Pinz, and Andrew Zisserman. "Convolutional two-stream network fusion for video action recognition." CVPR 2016. [code] Fusion Problem: Single frame models do not take into account motion in videos. Solution: extract optical flow for a stack of frames and use it as an input to a CNN.
  • 44. Two-streams 2D CNNs 44 Feichtenhofer, Christoph, Axel Pinz, and Richard Wildes. "Spatiotemporal residual networks for video action recognition." NIPS 2016. [code] Problem: Single frame models do not take into account motion in videos. Solution: extract optical flow for a stack of frames and use it as an input to a CNN.
  • 45. Two-streams 3D CNNs 45 Carreira, J., & Zisserman, A. . Quo vadis, action recognition? A new model and the kinetics dataset. CVPR 2017. [code]
  • 46. Two-streams 3D CNNs 46 Carreira, J., & Zisserman, A. . Quo vadis, action recognition? A new model and the kinetics dataset. CVPR 2017. [code]
  • 47. Two-streams 2D CNNs + RNN 47 Yue-Hei Ng, Joe, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. "Beyond short snippets: Deep networks for video classification." CVPR 2015
  • 49. Large-scale datasets 49 ● The reference dataset for image classification, ImageNet, has ~1.3M images ○ Training a state of the art CNN can take up to 2 weeks on a single GPU ● Now imagine that we have an ‘ImageNet’ of 1.3M videos ○ Assuming videos of 30s at 24fps, we have 936M frames ○ This is 720x ImageNet! ● Videos exhibit a large redundancy in time ○ We can reduce the frame rate without losing too much information Tips & Tricks by Víctor Campos (2017)
  • 50. Memory issues 50 ● Current GPUs can fit batches of 32~64 images when training state of the art CNNs ○ This means 32~64 video frames at once ● Memory footprint can be reduced in different ways if a pre-trained CNN model is used ○ Freezing some of the lower layers, reducing the memory impact of backprop ○ Extracting frame-level features and training a model on top of it (e.g. RNN on top of CNN features). This is equivalent to freezing the whole architecture, but the CNN part needs to be computed only once. Tips & Tricks by Víctor Campos (2017)
  • 51. I/O bottleneck 51 ● In practice, applying deep learning to video analysis requires from multi-GPU or distributed settings ● In such settings it is very important to avoid starving the GPUs or we will not obtain any speedup ○ The next batch needs to be loaded and preprocessed to keep the GPU as busy as possible ○ Using asynchronous data loading pipelines is a key factor ○ Loading individual files is slow due to the introduced overhead, so using other formats such as TFRecord/HDF5/LMDB is highly recommended Tips & Tricks by Víctor Campos (2017)
  • 52. 52 Learn more Rohit Ghosh, “Deep Learning for Videos: A 2018 Guide for Action recognition” (2018)