SlideShare a Scribd company logo
1 of 67
Download to read offline
Xavier Giro-i-Nieto
xavier.giro@upc.edu
Associate Professor
Universitat Politecnica de Catalunya
Technical University of Catalonia
Language and Vision
Day 4 Lecture 4
#DLUPC
http://bit.ly/dlcv2018
2
Acknowledgments
Antonio
Bonafonte
Santiago
Pascual
3
Acknowledgments
Marta R. Costa-jussà
4
Outline
1. Motivation
2. Image Captioning
3. Visual Question Answering / Reasoning
4. Joint Embeddings
5
Outline
1. Motivation
2. Image and Video Captioning
3. Visual Question Answering / Reasoning
4. Joint Embeddings
6
7[course site]
Text (English)
8
Text (French)
Neural Machine Translation
9
Neural Machine Translation
Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015)
Representation or
Embedding
10
Representation or
Embedding
11
Representation or
Embedding
Encoder Decoder
bit.ly/InsightDL2018
#InsightDL2018
12
Encoder-Decoder
Front View Side View
Representation of the sentence
Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015)
Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua
Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint
arXiv:1406.1078 (2014).
13
Representation or
Embedding
Encoder
bit.ly/InsightDL2018
#InsightDL2018
14
Encoder in three steps
Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015)
(2)
(3) (1) One hot encoding
(2) Word embedding
(3) Sequence summarization
bit.ly/InsightDL2018
#InsightDL2018
15
(1) One hot encoding
cat: xT
= [1,0,0, ..., 0]
dog: xT
= [0,1,0, ..., 0]
.
.
house: xT
= [0,0,0, …,0,1,0,...,0]
.
.
.
bit.ly/InsightDL2018
#InsightDL2018
16
(2) Word embeddings
Figure: Christopher Olah, Visualizing Representations
bit.ly/InsightDL2018
#InsightDL2018
17
(3) Sequence summarization
Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015)
18
Representation or
Embedding
Decoder
bit.ly/InsightDL2018
#InsightDL2018
19
Decoder
Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015)
The Recurrent State (zi
) of the decoder is determined by:
1) summary vector hT
2) previous output word ui-1
3) previous state zi-1
hT
bit.ly/InsightDL2018
#InsightDL2018
20
Decoder
Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015)
With zi
updated, we can compute a probability pi
for each word i as an
output of the RNN:
hT
bit.ly/InsightDL2018
#InsightDL2018
21
Decoder
Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015)
More words for the decoded sentence are generated until a <EOS> (End Of
Sentence) “word” is predicted.
EOS
22
Outline
1. Motivation
2. Image and Video Captioning
3. Visual Question Answering / Reasoning
4. Joint Embeddings
23
Representation or
Embedding
Encoder Decoder
24
Captioning: Show & Tell
Vinyals, Oriol, Alexander Toshev, Samy Bengio, and Dumitru Erhan. "Show and tell: A neural image caption
generator." CVPR 2015. [video]
25
Captioning: DeepImageSent
(Slides by Marc Bolaños): Karpathy, Andrej, and Li Fei-Fei. "Deep visual-semantic alignments for generating
image descriptions." CVPR 2015
26
Captioning: DeepImageSent
(Slides by Marc Bolaños): Karpathy, Andrej, and Li Fei-Fei. "Deep visual-semantic alignments for
generating image descriptions." CVPR 2015
only takes into account
image features in the first
hidden state
Multimodal Recurrent
Neural Network
Challenge on Multimodal Image Translation:
http://www.statmt.org/wmt17/multimodal-task.html#task1
Multimodal Machine Translation
28
Captioning: Show, Attend & Tell
Xu, Kelvin, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S.
Zemel, and Yoshua Bengio. "Show, Attend and Tell: Neural Image Caption Generation with Visual
Attention." ICML 2015
29
Captioning: Show, Attend & Tell
Xu, Kelvin, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S.
Zemel, and Yoshua Bengio. "Show, Attend and Tell: Neural Image Caption Generation with Visual
Attention." ICML 2015
30
Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for
dense captioning." CVPR 2016
Captioning (+ Detection): DenseCap
31
Captioning (+ Detection): DenseCap
Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for
dense captioning." CVPR 2016
32
Captioning (+ Detection): DenseCap
Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for
dense captioning." CVPR 2016
XAVI: “man has
short hair”, “man
with short hair”
AMAIA:”a woman
wearing a black
shirt”, “
BOTH: “two men
wearing black
glasses”
33
Captioning: Video
Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan,
Kate Saenko, Trevor Darrel. Long-term Recurrent Convolutional Networks for Visual Recognition and
Description, CVPR 2015. code
34
(Slides by Marc Bolaños) Pingbo Pan, Zhongwen Xu, Yi Yang,Fei Wu,Yueting Zhuang Hierarchical
Recurrent Neural Encoder for Video Representation with Application to Captioning, CVPR 2016.
LSTM unit
(2nd layer)
Time
Image
t = 1 t = T
hidden state
at t = T
first chunk
of data
Captioning: Video
35
Chung, Joon Son, Andrew Senior, Oriol Vinyals, and Andrew Zisserman. "Lip reading sentences in the wild."
CVPR 2017
36
Lipreading: Watch, Listen, Attend & Spell
Chung, Joon Son, Andrew Senior, Oriol Vinyals, and Andrew Zisserman. "Lip reading sentences in the
wild." CVPR 2017
Audio
features
Image
features
37
Lipreading: Watch, Listen, Attend & Spell
Chung, Joon Son, Andrew Senior, Oriol Vinyals, and Andrew Zisserman. "Lip reading sentences in the
wild." CVPR 2017
Attention over output
states from audio and
video is computed at
each timestep
38
Assael, Yannis M., Brendan Shillingford, Shimon Whiteson, and Nando de Freitas. "LipNet: End-to-End
Sentence-level Lipreading." (2016).
Lip Reading: LipNet
Input (video frames) and output (sentence) sequences are not
aligned
39
Graves et al. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with
Recurrent Neural Networks. ICML 2006
Lip Reading: LipNet
CTC Loss: Connectionist temporal classification
● Avoiding the need for alignment between input and output sequence by predicting
an additional “_” blank word
● Before computing the loss, repeated words and blank tokens are removed
● “a _ a b _ ” == “_ a a _ _ b b” == “a a b”
40
Lip Reading: LipNet
Assael et al. LipNet: Sentence-level Lipreading. arXiv Nov 2016
41
Outline
1. Neural Machine Transaltion (no vision here !)
2. Image and Video Captioning
3. Visual Question Answering / Reasoning
4. Joint Embeddings
42
Visual Question Answering (VQA)
[z1
, z2
, … zN
] [y1
, y2
, … yM
]
“Is economic growth decreasing ?”
“Yes”
Encode
Encode
Decode
43
Antol, Stanislaw, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and
Devi Parikh. "VQA: Visual question answering." CVPR 2015.
Visual Question Answering (VQA)
44
Extract visual
features
Embedding
Predict answerMerge
Question
What object is flying?
Answer
Kite
Slide credit: Issey Masuda
Visual Question Answering (VQA)
45
Visual Question Answering (VQA)
Masuda, Issey, Santiago Pascual de la Puente, and Xavier Giro-i-Nieto. "Open-Ended Visual
Question-Answering." ETSETB UPC TelecomBCN (2016).
Image
Question
Answer
46
Visual Question Answering (VQA)
Francisco Roldán, Issey Masuda, Santiago Pascual de la Puente, and Xavier Giro-i-Nieto.
"Visual Question-Answering 2.0." ETSETB UPC TelecomBCN (2017).
47
Noh, H., Seo, P. H., & Han, B. Image question answering using convolutional neural network with
dynamic parameter prediction. CVPR 2016
Dynamic Parameter Prediction Network (DPPnet)
Visual Question Answering (VQA)
48
Visual Question Answering: Dynamic
(Slides and Slidecast by Santi Pascual): Xiong, Caiming, Stephen Merity, and Richard Socher. "Dynamic
Memory Networks for Visual and Textual Question Answering." ICML 2016
49
Visual Question Answering: Grounded
(Slides and Screencast by Issey Masuda): Zhu, Yuke, Oliver Groth, Michael Bernstein, and Li Fei-Fei."Visual7W: Grounded
Question Answering in Images." CVPR 2016.
50
Visual Dialog (Image Guessing Game)
Das, Abhishek, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José MF Moura, Devi Parikh, and Dhruv Batra.
"Visual Dialog." CVPR 2017
51
Visual Dialog (Image Guessing Game)
Das, Abhishek, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José MF Moura, Devi Parikh, and Dhruv Batra.
"Visual Dialog." CVPR 2017
52
Visual Reasoning
Johnson, Justin, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, and Ross
Girshick. "CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning."
CVPR 2017
53
Visual Reasoning
(Slides by Fran Roldan) Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy Hoffman, Fei-Fei Li, Larry
Zitnick, Ross Girshick , “Inferring and Executing Programs for Visual Reasoning”. ICCV 2017
Program Generator Execution Engine
54
Outline
1. Neural Machine Transaltion (no vision here !)
2. Image and Video Captioning
3. Visual Question Answering / Reasoning
4. Joint Embeddings
55
Representation or
EmbeddingEncoder
Joint Neural Embeddings
Encoder
56
Frome, Andrea, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, and Tomas Mikolov. "Devise: A
deep visual-semantic embedding model." NIPS 2013
Joint Neural Embeddings
57
Socher, R., Ganjoo, M., Manning, C. D., & Ng, A., Zero-shot learning through cross-modal transfer.
NIPS 2013 [slides] [code]
Zero-shot learning:
a class not present in the
training set of images
can be predicted
(eg. no images from
“cat” in the training set)
Joint Neural Embeddings
58
Alejandro Woodward, Víctor Campos, Dèlia Fernàndez, Jordi Torres, Xavier Giró-i-Nieto,
Brendan Jou and Shih-Fu Chang (submitted)
Joint Neural Embeddings
Foggy Day
59
60
Amaia Salvador, Nicholas Haynes, Yusuf Aytar, Javier Marín, Ferda Ofli, Ingmar Weber,
Antonio Torralba, “Learning Cross-modal Embeddings for Cooking Recipes and Food
Images”. CVPR 2017
Image and text retrieval with joint embeddings.
Joint Neural Embeddings
61
Amaia Salvador, Nicholas Haynes, Yusuf Aytar, Javier Marín, Ferda Ofli, Ingmar Weber,
Antonio Torralba, “Learning Cross-modal Embeddings for Cooking Recipes and Food
Images”. CVPR 2017
Joint Neural Embeddings
62
Amaia Salvador, Nicholas Haynes, Yusuf Aytar, Javier Marín, Ferda Ofli, Ingmar Weber,
Antonio Torralba, “Learning Cross-modal Embeddings for Cooking Recipes and Food
Images”. CVPR 2017
Joint Neural Embeddings
joint
embedding
LSTM Bidirectional LSTM
63
Image to image and text
Aytar, Yusuf, Lluis Castrejon, Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. "Cross-Modal
Scene Networks." CVPR 2016.
64
Image to image and text
Aytar, Yusuf, Lluis Castrejon, Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. "Cross-Modal
Scene Networks." CVPR 2016.
Gella, Spandana, Rico Sennrich, Frank Keller, and Mirella Lapata. "Image Pivoting for Learning Multilingual Multimodal Representations." arXiv preprint
arXiv:1707.07601 (2017).
Janarthanan Rajendran, Mitesh M Khapra, Sarath Chandar, Balaraman Ravindran, Bridge Correlational Neural Networks for Multilingual Multimodal
Representation Learning NAACL, 2016
Multilingual & Multimodal Embeddings
66
Outline
1. Motivation
2. Image Captioning
3. Visual Question Answering / Reasoning
4. Joint Embeddings
67
Questions ?

More Related Content

What's hot

Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)
Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)
Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...Universitat Politècnica de Catalunya
 
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...Universitat Politècnica de Catalunya
 
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...Universitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 

What's hot (20)

Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)
Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)
Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)
 
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
 
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
 
One Perceptron to Rule Them All: Language and Vision
One Perceptron to Rule Them All: Language and VisionOne Perceptron to Rule Them All: Language and Vision
One Perceptron to Rule Them All: Language and Vision
 
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...
 
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
 
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Video + Language
Video + LanguageVideo + Language
Video + Language
 
Vision and Language: Past, Present and Future
Vision and Language: Past, Present and FutureVision and Language: Past, Present and Future
Vision and Language: Past, Present and Future
 
Deep Learning for Video: Language (UPC 2018)
Deep Learning for Video: Language (UPC 2018)Deep Learning for Video: Language (UPC 2018)
Deep Learning for Video: Language (UPC 2018)
 
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
 
Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019
Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019
Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019
 
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
 
Deep Learning from Videos (UPC 2018)
Deep Learning from Videos (UPC 2018)Deep Learning from Videos (UPC 2018)
Deep Learning from Videos (UPC 2018)
 
Learning with Unpaired Data
Learning with Unpaired DataLearning with Unpaired Data
Learning with Unpaired Data
 
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
 
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
 

Similar to Deep Language and Vision - Xavier Giro-i-Nieto - UPC Barcelona 2018

One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018Universitat Politècnica de Catalunya
 
One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)
One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)
One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)Universitat Politècnica de Catalunya
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsRoelof Pieters
 
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Universitat Politècnica de Catalunya
 
Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksSequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksNguyen Quang
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Universitat Politècnica de Catalunya
 
ICQE20: Quantitative Ethnography Visualizations as Tools for Thinking
ICQE20: Quantitative Ethnography Visualizations as Tools for ThinkingICQE20: Quantitative Ethnography Visualizations as Tools for Thinking
ICQE20: Quantitative Ethnography Visualizations as Tools for ThinkingSimon Buckingham Shum
 
Multi-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer CalixtoMulti-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer CalixtoSebastian Ruder
 
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016Universitat Politècnica de Catalunya
 
Interactive Video Search: Where is the User in the Age of Deep Learning?
Interactive Video Search: Where is the User in the Age of Deep Learning?Interactive Video Search: Where is the User in the Age of Deep Learning?
Interactive Video Search: Where is the User in the Age of Deep Learning?klschoef
 
Predicting Media Memorability with Audio, Video, and Text representations
Predicting Media Memorability with Audio, Video, and Text representationsPredicting Media Memorability with Audio, Video, and Text representations
Predicting Media Memorability with Audio, Video, and Text representationsAlison Reboud
 
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Universitat Politècnica de Catalunya
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Roelof Pieters
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2Karthik Murugesan
 

Similar to Deep Language and Vision - Xavier Giro-i-Nieto - UPC Barcelona 2018 (20)

One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
 
One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)
One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)
One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
 
Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksSequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural Networks
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
 
ICQE20: Quantitative Ethnography Visualizations as Tools for Thinking
ICQE20: Quantitative Ethnography Visualizations as Tools for ThinkingICQE20: Quantitative Ethnography Visualizations as Tools for Thinking
ICQE20: Quantitative Ethnography Visualizations as Tools for Thinking
 
Multi-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer CalixtoMulti-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer Calixto
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
 
Interactive Video Search: Where is the User in the Age of Deep Learning?
Interactive Video Search: Where is the User in the Age of Deep Learning?Interactive Video Search: Where is the User in the Age of Deep Learning?
Interactive Video Search: Where is the User in the Age of Deep Learning?
 
Predicting Media Memorability with Audio, Video, and Text representations
Predicting Media Memorability with Audio, Video, and Text representationsPredicting Media Memorability with Audio, Video, and Text representations
Predicting Media Memorability with Audio, Video, and Text representations
 
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
 
Deep Learning for Computer Vision: Video Analytics (UPC 2016)
Deep Learning for Computer Vision: Video Analytics (UPC 2016)Deep Learning for Computer Vision: Video Analytics (UPC 2016)
Deep Learning for Computer Vision: Video Analytics (UPC 2016)
 
Neural Architectures for Video Encoding
Neural Architectures for Video EncodingNeural Architectures for Video Encoding
Neural Architectures for Video Encoding
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 

More from Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Universitat Politècnica de Catalunya
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Universitat Politècnica de Catalunya
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Universitat Politècnica de Catalunya
 
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaSelf-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaUniversitat Politècnica de Catalunya
 
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaDeep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaUniversitat Politècnica de Catalunya
 

More from Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
 
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaSelf-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
 
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaDeep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
 

Recently uploaded

Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...gajnagarg
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...gajnagarg
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 

Recently uploaded (20)

Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 

Deep Language and Vision - Xavier Giro-i-Nieto - UPC Barcelona 2018

  • 1. Xavier Giro-i-Nieto xavier.giro@upc.edu Associate Professor Universitat Politecnica de Catalunya Technical University of Catalonia Language and Vision Day 4 Lecture 4 #DLUPC http://bit.ly/dlcv2018
  • 4. 4 Outline 1. Motivation 2. Image Captioning 3. Visual Question Answering / Reasoning 4. Joint Embeddings
  • 5. 5 Outline 1. Motivation 2. Image and Video Captioning 3. Visual Question Answering / Reasoning 4. Joint Embeddings
  • 6. 6
  • 9. 9 Neural Machine Translation Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015) Representation or Embedding
  • 12. bit.ly/InsightDL2018 #InsightDL2018 12 Encoder-Decoder Front View Side View Representation of the sentence Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015) Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint arXiv:1406.1078 (2014).
  • 14. bit.ly/InsightDL2018 #InsightDL2018 14 Encoder in three steps Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015) (2) (3) (1) One hot encoding (2) Word embedding (3) Sequence summarization
  • 15. bit.ly/InsightDL2018 #InsightDL2018 15 (1) One hot encoding cat: xT = [1,0,0, ..., 0] dog: xT = [0,1,0, ..., 0] . . house: xT = [0,0,0, …,0,1,0,...,0] . . .
  • 16. bit.ly/InsightDL2018 #InsightDL2018 16 (2) Word embeddings Figure: Christopher Olah, Visualizing Representations
  • 17. bit.ly/InsightDL2018 #InsightDL2018 17 (3) Sequence summarization Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015)
  • 19. bit.ly/InsightDL2018 #InsightDL2018 19 Decoder Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015) The Recurrent State (zi ) of the decoder is determined by: 1) summary vector hT 2) previous output word ui-1 3) previous state zi-1 hT
  • 20. bit.ly/InsightDL2018 #InsightDL2018 20 Decoder Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015) With zi updated, we can compute a probability pi for each word i as an output of the RNN: hT
  • 21. bit.ly/InsightDL2018 #InsightDL2018 21 Decoder Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015) More words for the decoded sentence are generated until a <EOS> (End Of Sentence) “word” is predicted. EOS
  • 22. 22 Outline 1. Motivation 2. Image and Video Captioning 3. Visual Question Answering / Reasoning 4. Joint Embeddings
  • 24. 24 Captioning: Show & Tell Vinyals, Oriol, Alexander Toshev, Samy Bengio, and Dumitru Erhan. "Show and tell: A neural image caption generator." CVPR 2015. [video]
  • 25. 25 Captioning: DeepImageSent (Slides by Marc Bolaños): Karpathy, Andrej, and Li Fei-Fei. "Deep visual-semantic alignments for generating image descriptions." CVPR 2015
  • 26. 26 Captioning: DeepImageSent (Slides by Marc Bolaños): Karpathy, Andrej, and Li Fei-Fei. "Deep visual-semantic alignments for generating image descriptions." CVPR 2015 only takes into account image features in the first hidden state Multimodal Recurrent Neural Network
  • 27. Challenge on Multimodal Image Translation: http://www.statmt.org/wmt17/multimodal-task.html#task1 Multimodal Machine Translation
  • 28. 28 Captioning: Show, Attend & Tell Xu, Kelvin, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention." ICML 2015
  • 29. 29 Captioning: Show, Attend & Tell Xu, Kelvin, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention." ICML 2015
  • 30. 30 Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for dense captioning." CVPR 2016 Captioning (+ Detection): DenseCap
  • 31. 31 Captioning (+ Detection): DenseCap Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for dense captioning." CVPR 2016
  • 32. 32 Captioning (+ Detection): DenseCap Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for dense captioning." CVPR 2016 XAVI: “man has short hair”, “man with short hair” AMAIA:”a woman wearing a black shirt”, “ BOTH: “two men wearing black glasses”
  • 33. 33 Captioning: Video Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrel. Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR 2015. code
  • 34. 34 (Slides by Marc Bolaños) Pingbo Pan, Zhongwen Xu, Yi Yang,Fei Wu,Yueting Zhuang Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning, CVPR 2016. LSTM unit (2nd layer) Time Image t = 1 t = T hidden state at t = T first chunk of data Captioning: Video
  • 35. 35 Chung, Joon Son, Andrew Senior, Oriol Vinyals, and Andrew Zisserman. "Lip reading sentences in the wild." CVPR 2017
  • 36. 36 Lipreading: Watch, Listen, Attend & Spell Chung, Joon Son, Andrew Senior, Oriol Vinyals, and Andrew Zisserman. "Lip reading sentences in the wild." CVPR 2017 Audio features Image features
  • 37. 37 Lipreading: Watch, Listen, Attend & Spell Chung, Joon Son, Andrew Senior, Oriol Vinyals, and Andrew Zisserman. "Lip reading sentences in the wild." CVPR 2017 Attention over output states from audio and video is computed at each timestep
  • 38. 38 Assael, Yannis M., Brendan Shillingford, Shimon Whiteson, and Nando de Freitas. "LipNet: End-to-End Sentence-level Lipreading." (2016). Lip Reading: LipNet Input (video frames) and output (sentence) sequences are not aligned
  • 39. 39 Graves et al. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. ICML 2006 Lip Reading: LipNet CTC Loss: Connectionist temporal classification ● Avoiding the need for alignment between input and output sequence by predicting an additional “_” blank word ● Before computing the loss, repeated words and blank tokens are removed ● “a _ a b _ ” == “_ a a _ _ b b” == “a a b”
  • 40. 40 Lip Reading: LipNet Assael et al. LipNet: Sentence-level Lipreading. arXiv Nov 2016
  • 41. 41 Outline 1. Neural Machine Transaltion (no vision here !) 2. Image and Video Captioning 3. Visual Question Answering / Reasoning 4. Joint Embeddings
  • 42. 42 Visual Question Answering (VQA) [z1 , z2 , … zN ] [y1 , y2 , … yM ] “Is economic growth decreasing ?” “Yes” Encode Encode Decode
  • 43. 43 Antol, Stanislaw, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. "VQA: Visual question answering." CVPR 2015. Visual Question Answering (VQA)
  • 44. 44 Extract visual features Embedding Predict answerMerge Question What object is flying? Answer Kite Slide credit: Issey Masuda Visual Question Answering (VQA)
  • 45. 45 Visual Question Answering (VQA) Masuda, Issey, Santiago Pascual de la Puente, and Xavier Giro-i-Nieto. "Open-Ended Visual Question-Answering." ETSETB UPC TelecomBCN (2016). Image Question Answer
  • 46. 46 Visual Question Answering (VQA) Francisco Roldán, Issey Masuda, Santiago Pascual de la Puente, and Xavier Giro-i-Nieto. "Visual Question-Answering 2.0." ETSETB UPC TelecomBCN (2017).
  • 47. 47 Noh, H., Seo, P. H., & Han, B. Image question answering using convolutional neural network with dynamic parameter prediction. CVPR 2016 Dynamic Parameter Prediction Network (DPPnet) Visual Question Answering (VQA)
  • 48. 48 Visual Question Answering: Dynamic (Slides and Slidecast by Santi Pascual): Xiong, Caiming, Stephen Merity, and Richard Socher. "Dynamic Memory Networks for Visual and Textual Question Answering." ICML 2016
  • 49. 49 Visual Question Answering: Grounded (Slides and Screencast by Issey Masuda): Zhu, Yuke, Oliver Groth, Michael Bernstein, and Li Fei-Fei."Visual7W: Grounded Question Answering in Images." CVPR 2016.
  • 50. 50 Visual Dialog (Image Guessing Game) Das, Abhishek, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José MF Moura, Devi Parikh, and Dhruv Batra. "Visual Dialog." CVPR 2017
  • 51. 51 Visual Dialog (Image Guessing Game) Das, Abhishek, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José MF Moura, Devi Parikh, and Dhruv Batra. "Visual Dialog." CVPR 2017
  • 52. 52 Visual Reasoning Johnson, Justin, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, and Ross Girshick. "CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning." CVPR 2017
  • 53. 53 Visual Reasoning (Slides by Fran Roldan) Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy Hoffman, Fei-Fei Li, Larry Zitnick, Ross Girshick , “Inferring and Executing Programs for Visual Reasoning”. ICCV 2017 Program Generator Execution Engine
  • 54. 54 Outline 1. Neural Machine Transaltion (no vision here !) 2. Image and Video Captioning 3. Visual Question Answering / Reasoning 4. Joint Embeddings
  • 56. 56 Frome, Andrea, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, and Tomas Mikolov. "Devise: A deep visual-semantic embedding model." NIPS 2013 Joint Neural Embeddings
  • 57. 57 Socher, R., Ganjoo, M., Manning, C. D., & Ng, A., Zero-shot learning through cross-modal transfer. NIPS 2013 [slides] [code] Zero-shot learning: a class not present in the training set of images can be predicted (eg. no images from “cat” in the training set) Joint Neural Embeddings
  • 58. 58 Alejandro Woodward, Víctor Campos, Dèlia Fernàndez, Jordi Torres, Xavier Giró-i-Nieto, Brendan Jou and Shih-Fu Chang (submitted) Joint Neural Embeddings Foggy Day
  • 59. 59
  • 60. 60 Amaia Salvador, Nicholas Haynes, Yusuf Aytar, Javier Marín, Ferda Ofli, Ingmar Weber, Antonio Torralba, “Learning Cross-modal Embeddings for Cooking Recipes and Food Images”. CVPR 2017 Image and text retrieval with joint embeddings. Joint Neural Embeddings
  • 61. 61 Amaia Salvador, Nicholas Haynes, Yusuf Aytar, Javier Marín, Ferda Ofli, Ingmar Weber, Antonio Torralba, “Learning Cross-modal Embeddings for Cooking Recipes and Food Images”. CVPR 2017 Joint Neural Embeddings
  • 62. 62 Amaia Salvador, Nicholas Haynes, Yusuf Aytar, Javier Marín, Ferda Ofli, Ingmar Weber, Antonio Torralba, “Learning Cross-modal Embeddings for Cooking Recipes and Food Images”. CVPR 2017 Joint Neural Embeddings joint embedding LSTM Bidirectional LSTM
  • 63. 63 Image to image and text Aytar, Yusuf, Lluis Castrejon, Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. "Cross-Modal Scene Networks." CVPR 2016.
  • 64. 64 Image to image and text Aytar, Yusuf, Lluis Castrejon, Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. "Cross-Modal Scene Networks." CVPR 2016.
  • 65. Gella, Spandana, Rico Sennrich, Frank Keller, and Mirella Lapata. "Image Pivoting for Learning Multilingual Multimodal Representations." arXiv preprint arXiv:1707.07601 (2017). Janarthanan Rajendran, Mitesh M Khapra, Sarath Chandar, Balaraman Ravindran, Bridge Correlational Neural Networks for Multilingual Multimodal Representation Learning NAACL, 2016 Multilingual & Multimodal Embeddings
  • 66. 66 Outline 1. Motivation 2. Image Captioning 3. Visual Question Answering / Reasoning 4. Joint Embeddings