SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
Roelof Pieters
“Zero-­‐Shot	
  Learning	
  Through	
  
Cross-­‐Modal	
  Transfer”
20	
  February	
  2015	
  

Deep	
  Learning	
  Reading	
  Group	
  
Richard Socher, Milind Ganjoo, Hamsa Sridhar, Osbert Bastani,
Christopher D. Manning, Andrew Y. Ng
http://arxiv.org/abs/1301.3666
@graphific
ICLR 2013
http://papers.nips.cc/… NIPS 2013
Review of
http://www.csc.kth.se/~roelof/
“a zero-shot model that can predict
both seen and unseen classes”
Zero-Shot Learning Through Cross-Modal Transfer

Richard Socher, Milind Ganjoo, Hamsa Sridhar, Osbert
Bastani, Christopher D. Manning, Andrew Y. Ng, ICLR 2013
Core (novel) Idea:
Key Ideas
• Semantic word vector representations:
• Allows transfer of knowledge between modalities
• Even when these representations are learned in an
unsupervised way
• Bayesian framework:
1. Differentiate between unseen/seen classes
2. From points on the semantic manifold of trained classes
3. Allows combining both zero-shot and seen classification
into one framework

[ Ø-shot + 1-shot = “multi-shot” :) ]
Visual-Semantic Word Space
• Word vectors capture distributional similarities
from a large, unsupervised text corpus. [Word
vectors create a semantic space] (Huang et al. 2012)
• Images are mapped into a semantic space of words that is
learned by a neural network model (Coates & Ng 2011;
Coates et al. 2011)
• By learning an image mapping into this space, the word
vectors get implicitly grounded by the visual modality,
allowing us to give prototypical instances for various word
(Socher et al. 2013, this paper)
• Word vectors capture distributional similarities from a
large, unsupervised text corpus. [Word vectors create a
semantic space] (Huang et al. 2012)
• Images are mapped into a semantic space of words that is
learned by a neural network model (Coates & Ng 2011;
Coates et al. 2011)
• By learning an image mapping into this space, the word
vectors get implicitly grounded by the visual modality,
allowing us to give prototypical instances for various word
(Socher et al. 2013, this paper)
Visual-Semantic Word Space
Semantic Space
Visual-Semantic Word Space
Semantic Space
E. H. Huang, R. Socher, C. D. Manning, andA. Y. Ng.
Improving Word Representations via Global Context and
Multiple Word Prototypes. InACL, 2012
• (Socher et al. 2013) uses pre-trained (50-d) word
vectors of (Huang et al. 2012):
http://www.socher.org/index.php/Main/ImprovingWordRepresentationsViaGlobalContextAndMultipleWordPrototypes
[Huang et al. 2012]
Semantic Space
“You shall know a word by the company it keeps”

(J. R. Firth 1957)
One of the most successful ideas of modern
statistical NLP!
these words represent banking
Distributed Semantics (short recap)
[Huang et al. 2012]
[Huang et al. 2012]
local context:
global context:
final score:
[Huang et al. 2012]
activation of the hidden layer with h hidden nodes
1st layer weights
2th layer weights
1st layer bias
2th layer bias
local context
scoring function
element-wise activation function (ie tanh)
concatenation of the m word embeddings
representing sequence s
activation of the hidden layer with h(g) hidden nodes
1st layer weights
2th layer weights
1st layer bias
2th layer bias
concatenation of the weighted average document
vector and the vector of the last word in s
weighting function that captures the importance of
word ti in the document (tf-idf)
global context
scoring function
document(s) as ordered list of word embeddings
weighted average of all word vectors in a document
• Word vectors capture distributional similarities from a
large, unsupervised text corpus. [Word vectors create a
semantic space] (Huang et al. 2012)
• Images are mapped into a semantic space of words
that is learned by a neural network model (Coates &
Ng 2011; Coates et al. 2011)
• By learning an image mapping into this space, the word
vectors get implicitly grounded by the visual modality,
allowing us to give prototypical instances for various word
(Socher et al. 2013, this paper)
Visual-Semantic Word Space
Semantic Space
• Word vectors capture distributional similarities from a
large, unsupervised text corpus. [Word vectors create a
semantic space] (Huang et al. 2012)
• Images are mapped into a semantic space of words that is
learned by a neural network model (Coates & Ng 2011;
Coates et al. 2011)
• By learning an image mapping into this space, the word
vectors get implicitly grounded by the visual modality,
allowing us to give prototypical instances for various word
(Socher et al. 2013, this paper)
Visual-Semantic Word Space
Semantic Space
Image Features
Image Feature Learning
• high level description: extract random patches, extract features
from sub-patches, pool features, train liner classifier to predict
labels
• = fast simple algorithms with the correct parameters work as
well as complex, slow algorithms
[Coates et al. 2011 (used by Coates & Ng 2011)]
• Word vectors capture distributional similarities from a
large, unsupervised text corpus. [Word vectors create a
semantic space] (Huang et al. 2012)
• Images are mapped into a semantic space of words that is
learned by a neural network model (Coates & Ng 2011;
Coates et al. 2011)
• By learning an image mapping into this space, the
word vectors get implicitly grounded by the visual
modality, allowing us to give prototypical instances
for various word (Socher et al. 2013, this paper)
Visual-Semantic Word Space
Semantic Space
Image Features
• Word vectors capture distributional similarities from a
large, unsupervised text corpus. [Word vectors create a
semantic space] (Huang et al. 2012)
• Images are mapped into a semantic space of words that is
learned by a neural network model (Coates & Ng 2011;
Coates et al. 2011)
• By learning an image mapping into this space, the word
vectors get implicitly grounded by the visual modality,
allowing us to give prototypical instances for various word
(Socher et al. 2013, this paper)
Visual-Semantic Word Space
Semantic Space
Visual-Semantic Space
Image Features
[this paper, Socher et al. 2013]
Visual-Semantic Space
Projecting Images into Visual Space
Objective function(s):
[Socher et al. 2013]
training images
set of word vectors seen/unseen visual classes
mapped to the word vector (class name)
Projecting Images into Visual Space
Objective function(s):
[Socher et al. 2013]
training images
set of word vectors seen/unseen visual classes
mapped to the word vector (class name)
T-SNE visualization of the semantic word space [Socher et al. 2013]
[Socher et al. 2013]
Projecting Images into Visual Space
Mapped points of
seen classes:
(Outlier Detection)
Predicting class y:
binary visibility random variable
probability of an image being in an unseen class
Treshold T:
[Socher et al. 2013]
Projecting Images into Visual Space
(Outlier Detection)
binary visibility random variable
probability of an image being in an unseen class
known class prediction:
[Socher et al. 2013][Socher et al. NIPS 2013]
Results
Main Contributions
Zero-shot learning
• Good classification of (pairs of) unseen classes can be
achieved based on learned representations for these
classes
• => as opposed to hand designed representations
• => extends (Lampert 2009; Guo-Jun 2011) [Manual defined
visual/semantic attributes to classify unseen classes]
Main Contributions
“Multi”-shot learning
• Deal with both seen and unseen classes: Allows combining both
zero-shot and seen classification into one framework:

[ Ø-shot + 1-shot = “multi-shot” :) ]
• Assumption: unseen classes as outliers
• Major weakness:

drop from 80% to 70% for 15%-30% accuracy (on particular classes)
• => extends (Lampert 2009; Palatucci 2009) [manual defined
representations, limited to zero-shot classes], using outlier
detection
• => extends (Weston et al. 2010) (joint embedding images and labels
through linear mapping) [linear mapping only, so cant generalise to
new classes: 1-shot], using outlier detection
Main Contributions
Knowledge-Transfer
• Allows transfer of knowledge between modalities, within
multimodal embeddings
• Allows for unsupervised matching
• => extends (Socher & Fei-Fei 2012) (kernelized canonical
correlation analysis) [still require small amount of training
data for each class: 1-shot]
• => extends (Salakhutdinov et al. 2012) (learn low-level image
features followed by a probabilistic model to transfer
knowledge) [also limited to 1-shot classes]
Bibliography
• C. H. Lampert, H. Nickisch, and S. Harmeling. Learning to
Detect Unseen Object Classes by Between-Class Attribute
Transfer. In CVPR, 2009
• M. Palatucci, D. Pomerleau, G. Hinton, and T. Mitchell. Zero-
shot learning with semantic output codes. In NIPS, 2009
• Guo-Jun Qi, C. Aggarwal, Y. Rui, Q. Tian, S. Chang, and T. Huang.
Towards cross-category knowledge propagation for learning visual
concepts. In CVPR, 2011
• R. Socher and L. Fei-Fei. Connecting modalities: Semi-supervised
segmentation and annotation of images using unaligned text
corpora. In CVPR, 2010
• E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng. Improving
Word Representations via Global Context and Multiple Word
Prototypes. In ACL, 2012
Bibliography
• A. Coates and A. Ng. The Importance of Encoding Versus
Training with Sparse Coding and Vector Quantization. In ICML,
2011.
• Coates, Adam, Lee, Honlak, and Ng, Andrew Y. An analysis of
single-layer networks in unsupervised feature learning. In
International Conference on AI and Statistics, 2011.
• A. Torralba R. Salakhutdinov, J. Tenenbaum. Learning to learn
with compound hierarchical-deep models. In NIPS, 2012.
• J. Weston, S. Bengio1, and N. Usunier. Large Scale Image
Annotation: Learning to Rank with Joint Word-Image
Embeddings. In Machine Learning, 81 (1):21-35, 2010

Contenu connexe

Tendances

Tendances (20)

Zero shot-learning: paper presentation
Zero shot-learning: paper presentationZero shot-learning: paper presentation
Zero shot-learning: paper presentation
 
Imitation learning tutorial
Imitation learning tutorialImitation learning tutorial
Imitation learning tutorial
 
인공지능 방법론 - Deep Learning 쉽게 이해하기
인공지능 방법론 - Deep Learning 쉽게 이해하기인공지능 방법론 - Deep Learning 쉽게 이해하기
인공지능 방법론 - Deep Learning 쉽게 이해하기
 
SSII2019TS: Shall We GANs?​ ~GANの基礎から最近の研究まで~
SSII2019TS: Shall We GANs?​ ~GANの基礎から最近の研究まで~SSII2019TS: Shall We GANs?​ ~GANの基礎から最近の研究まで~
SSII2019TS: Shall We GANs?​ ~GANの基礎から最近の研究まで~
 
알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017
 
基礎からわかる確率過程
基礎からわかる確率過程基礎からわかる確率過程
基礎からわかる確率過程
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
 
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks? 【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Deep Generative Models
Deep Generative ModelsDeep Generative Models
Deep Generative Models
 
[PR12] PixelRNN- Jaejun Yoo
[PR12] PixelRNN- Jaejun Yoo[PR12] PixelRNN- Jaejun Yoo
[PR12] PixelRNN- Jaejun Yoo
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
Yolo
YoloYolo
Yolo
 
Reinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual BanditsReinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual Bandits
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
Batch normalization effectiveness_20190206
Batch normalization effectiveness_20190206Batch normalization effectiveness_20190206
Batch normalization effectiveness_20190206
 
안.전.제.일. 강화학습!
안.전.제.일. 강화학습!안.전.제.일. 강화학습!
안.전.제.일. 강화학습!
 
Trust Region Policy Optimization
Trust Region Policy OptimizationTrust Region Policy Optimization
Trust Region Policy Optimization
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Introduction to SAC(Soft Actor-Critic)
Introduction to SAC(Soft Actor-Critic)Introduction to SAC(Soft Actor-Critic)
Introduction to SAC(Soft Actor-Critic)
 

Similaire à Zero shot learning through cross-modal transfer

Continuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdfContinuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdf
devangmittal4
 
Towards Linked Ontologies and Data on the Semantic Web
Towards Linked Ontologies and Data on the Semantic WebTowards Linked Ontologies and Data on the Semantic Web
Towards Linked Ontologies and Data on the Semantic Web
Jie Bao
 
Ijarcet vol-2-issue-4-1363-1367
Ijarcet vol-2-issue-4-1363-1367Ijarcet vol-2-issue-4-1363-1367
Ijarcet vol-2-issue-4-1363-1367
Editor IJARCET
 

Similaire à Zero shot learning through cross-modal transfer (20)

Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 
Ontology Mapping
Ontology MappingOntology Mapping
Ontology Mapping
 
Continuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdfContinuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdf
 
ONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEW
ONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEWONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEW
ONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEW
 
ONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEW
ONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEW ONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEW
ONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEW
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Mental Rotation Skills
Mental Rotation SkillsMental Rotation Skills
Mental Rotation Skills
 
Towards Linked Ontologies and Data on the Semantic Web
Towards Linked Ontologies and Data on the Semantic WebTowards Linked Ontologies and Data on the Semantic Web
Towards Linked Ontologies and Data on the Semantic Web
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
 
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONSONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
 
Vocabulary length experiments for binary image classification using bov approach
Vocabulary length experiments for binary image classification using bov approachVocabulary length experiments for binary image classification using bov approach
Vocabulary length experiments for binary image classification using bov approach
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
Ijarcet vol-2-issue-4-1363-1367
Ijarcet vol-2-issue-4-1363-1367Ijarcet vol-2-issue-4-1363-1367
Ijarcet vol-2-issue-4-1363-1367
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
 
A Context-Based Algorithm For Sentiment Analysis
A Context-Based Algorithm For Sentiment AnalysisA Context-Based Algorithm For Sentiment Analysis
A Context-Based Algorithm For Sentiment Analysis
 
Visual mapping sentence_a_methodological (1)
Visual mapping sentence_a_methodological (1)Visual mapping sentence_a_methodological (1)
Visual mapping sentence_a_methodological (1)
 
Bridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionBridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full version
 
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...
 

Plus de Roelof Pieters

Speculations in anthropology and tech for an uncertain future
Speculations in anthropology and tech for an uncertain futureSpeculations in anthropology and tech for an uncertain future
Speculations in anthropology and tech for an uncertain future
Roelof Pieters
 

Plus de Roelof Pieters (20)

Speculations in anthropology and tech for an uncertain future
Speculations in anthropology and tech for an uncertain futureSpeculations in anthropology and tech for an uncertain future
Speculations in anthropology and tech for an uncertain future
 
AI assisted creativity
AI assisted creativity AI assisted creativity
AI assisted creativity
 
Creativity and AI: 
Deep Neural Nets "Going Wild"
Creativity and AI: 
Deep Neural Nets "Going Wild"Creativity and AI: 
Deep Neural Nets "Going Wild"
Creativity and AI: 
Deep Neural Nets "Going Wild"
 
Deep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with styleDeep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with style
 
Building a Deep Learning (Dream) Machine
Building a Deep Learning (Dream) MachineBuilding a Deep Learning (Dream) Machine
Building a Deep Learning (Dream) Machine
 
Multi-modal embeddings: from discriminative to generative models and creative ai
Multi-modal embeddings: from discriminative to generative models and creative aiMulti-modal embeddings: from discriminative to generative models and creative ai
Multi-modal embeddings: from discriminative to generative models and creative ai
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddings
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Creative AI & multimodality: looking ahead
Creative AI & multimodality: looking aheadCreative AI & multimodality: looking ahead
Creative AI & multimodality: looking ahead
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
 
Explore Data: Data Science + Visualization
Explore Data: Data Science + VisualizationExplore Data: Data Science + Visualization
Explore Data: Data Science + Visualization
 
Deep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorDeep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog Detector
 
Graph, Data-science, and Deep Learning
Graph, Data-science, and Deep LearningGraph, Data-science, and Deep Learning
Graph, Data-science, and Deep Learning
 
Deep Learning: a birds eye view
Deep Learning: a birds eye viewDeep Learning: a birds eye view
Deep Learning: a birds eye view
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionary
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and Graphs
 
Hackathon 2014 NLP Hack
Hackathon 2014 NLP HackHackathon 2014 NLP Hack
Hackathon 2014 NLP Hack
 

Dernier

Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Hung Le
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
Kayode Fayemi
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
ZurliaSoop
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
amilabibi1
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
David Celestin
 

Dernier (17)

Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 
Zone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxZone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptx
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait Cityin kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
 

Zero shot learning through cross-modal transfer

  • 1. Roelof Pieters “Zero-­‐Shot  Learning  Through   Cross-­‐Modal  Transfer” 20  February  2015  
 Deep  Learning  Reading  Group   Richard Socher, Milind Ganjoo, Hamsa Sridhar, Osbert Bastani, Christopher D. Manning, Andrew Y. Ng http://arxiv.org/abs/1301.3666 @graphific ICLR 2013 http://papers.nips.cc/… NIPS 2013 Review of http://www.csc.kth.se/~roelof/
  • 2. “a zero-shot model that can predict both seen and unseen classes” Zero-Shot Learning Through Cross-Modal Transfer
 Richard Socher, Milind Ganjoo, Hamsa Sridhar, Osbert Bastani, Christopher D. Manning, Andrew Y. Ng, ICLR 2013 Core (novel) Idea:
  • 3. Key Ideas • Semantic word vector representations: • Allows transfer of knowledge between modalities • Even when these representations are learned in an unsupervised way • Bayesian framework: 1. Differentiate between unseen/seen classes 2. From points on the semantic manifold of trained classes 3. Allows combining both zero-shot and seen classification into one framework
 [ Ø-shot + 1-shot = “multi-shot” :) ]
  • 4. Visual-Semantic Word Space • Word vectors capture distributional similarities from a large, unsupervised text corpus. [Word vectors create a semantic space] (Huang et al. 2012) • Images are mapped into a semantic space of words that is learned by a neural network model (Coates & Ng 2011; Coates et al. 2011) • By learning an image mapping into this space, the word vectors get implicitly grounded by the visual modality, allowing us to give prototypical instances for various word (Socher et al. 2013, this paper)
  • 5. • Word vectors capture distributional similarities from a large, unsupervised text corpus. [Word vectors create a semantic space] (Huang et al. 2012) • Images are mapped into a semantic space of words that is learned by a neural network model (Coates & Ng 2011; Coates et al. 2011) • By learning an image mapping into this space, the word vectors get implicitly grounded by the visual modality, allowing us to give prototypical instances for various word (Socher et al. 2013, this paper) Visual-Semantic Word Space Semantic Space
  • 6. Visual-Semantic Word Space Semantic Space E. H. Huang, R. Socher, C. D. Manning, andA. Y. Ng. Improving Word Representations via Global Context and Multiple Word Prototypes. InACL, 2012 • (Socher et al. 2013) uses pre-trained (50-d) word vectors of (Huang et al. 2012):
  • 8. “You shall know a word by the company it keeps”
 (J. R. Firth 1957) One of the most successful ideas of modern statistical NLP! these words represent banking Distributed Semantics (short recap)
  • 10. [Huang et al. 2012]
  • 11. local context: global context: final score: [Huang et al. 2012]
  • 12. activation of the hidden layer with h hidden nodes 1st layer weights 2th layer weights 1st layer bias 2th layer bias local context scoring function element-wise activation function (ie tanh) concatenation of the m word embeddings representing sequence s
  • 13. activation of the hidden layer with h(g) hidden nodes 1st layer weights 2th layer weights 1st layer bias 2th layer bias concatenation of the weighted average document vector and the vector of the last word in s weighting function that captures the importance of word ti in the document (tf-idf) global context scoring function document(s) as ordered list of word embeddings weighted average of all word vectors in a document
  • 14. • Word vectors capture distributional similarities from a large, unsupervised text corpus. [Word vectors create a semantic space] (Huang et al. 2012) • Images are mapped into a semantic space of words that is learned by a neural network model (Coates & Ng 2011; Coates et al. 2011) • By learning an image mapping into this space, the word vectors get implicitly grounded by the visual modality, allowing us to give prototypical instances for various word (Socher et al. 2013, this paper) Visual-Semantic Word Space Semantic Space
  • 15. • Word vectors capture distributional similarities from a large, unsupervised text corpus. [Word vectors create a semantic space] (Huang et al. 2012) • Images are mapped into a semantic space of words that is learned by a neural network model (Coates & Ng 2011; Coates et al. 2011) • By learning an image mapping into this space, the word vectors get implicitly grounded by the visual modality, allowing us to give prototypical instances for various word (Socher et al. 2013, this paper) Visual-Semantic Word Space Semantic Space Image Features
  • 16. Image Feature Learning • high level description: extract random patches, extract features from sub-patches, pool features, train liner classifier to predict labels • = fast simple algorithms with the correct parameters work as well as complex, slow algorithms [Coates et al. 2011 (used by Coates & Ng 2011)]
  • 17. • Word vectors capture distributional similarities from a large, unsupervised text corpus. [Word vectors create a semantic space] (Huang et al. 2012) • Images are mapped into a semantic space of words that is learned by a neural network model (Coates & Ng 2011; Coates et al. 2011) • By learning an image mapping into this space, the word vectors get implicitly grounded by the visual modality, allowing us to give prototypical instances for various word (Socher et al. 2013, this paper) Visual-Semantic Word Space Semantic Space Image Features
  • 18. • Word vectors capture distributional similarities from a large, unsupervised text corpus. [Word vectors create a semantic space] (Huang et al. 2012) • Images are mapped into a semantic space of words that is learned by a neural network model (Coates & Ng 2011; Coates et al. 2011) • By learning an image mapping into this space, the word vectors get implicitly grounded by the visual modality, allowing us to give prototypical instances for various word (Socher et al. 2013, this paper) Visual-Semantic Word Space Semantic Space Visual-Semantic Space Image Features
  • 19. [this paper, Socher et al. 2013] Visual-Semantic Space
  • 20. Projecting Images into Visual Space Objective function(s): [Socher et al. 2013] training images set of word vectors seen/unseen visual classes mapped to the word vector (class name)
  • 21. Projecting Images into Visual Space Objective function(s): [Socher et al. 2013] training images set of word vectors seen/unseen visual classes mapped to the word vector (class name)
  • 22. T-SNE visualization of the semantic word space [Socher et al. 2013]
  • 23. [Socher et al. 2013] Projecting Images into Visual Space Mapped points of seen classes: (Outlier Detection) Predicting class y: binary visibility random variable probability of an image being in an unseen class Treshold T:
  • 24. [Socher et al. 2013] Projecting Images into Visual Space (Outlier Detection) binary visibility random variable probability of an image being in an unseen class known class prediction:
  • 25. [Socher et al. 2013][Socher et al. NIPS 2013] Results
  • 26. Main Contributions Zero-shot learning • Good classification of (pairs of) unseen classes can be achieved based on learned representations for these classes • => as opposed to hand designed representations • => extends (Lampert 2009; Guo-Jun 2011) [Manual defined visual/semantic attributes to classify unseen classes]
  • 27. Main Contributions “Multi”-shot learning • Deal with both seen and unseen classes: Allows combining both zero-shot and seen classification into one framework:
 [ Ø-shot + 1-shot = “multi-shot” :) ] • Assumption: unseen classes as outliers • Major weakness:
 drop from 80% to 70% for 15%-30% accuracy (on particular classes) • => extends (Lampert 2009; Palatucci 2009) [manual defined representations, limited to zero-shot classes], using outlier detection • => extends (Weston et al. 2010) (joint embedding images and labels through linear mapping) [linear mapping only, so cant generalise to new classes: 1-shot], using outlier detection
  • 28. Main Contributions Knowledge-Transfer • Allows transfer of knowledge between modalities, within multimodal embeddings • Allows for unsupervised matching • => extends (Socher & Fei-Fei 2012) (kernelized canonical correlation analysis) [still require small amount of training data for each class: 1-shot] • => extends (Salakhutdinov et al. 2012) (learn low-level image features followed by a probabilistic model to transfer knowledge) [also limited to 1-shot classes]
  • 29. Bibliography • C. H. Lampert, H. Nickisch, and S. Harmeling. Learning to Detect Unseen Object Classes by Between-Class Attribute Transfer. In CVPR, 2009 • M. Palatucci, D. Pomerleau, G. Hinton, and T. Mitchell. Zero- shot learning with semantic output codes. In NIPS, 2009 • Guo-Jun Qi, C. Aggarwal, Y. Rui, Q. Tian, S. Chang, and T. Huang. Towards cross-category knowledge propagation for learning visual concepts. In CVPR, 2011 • R. Socher and L. Fei-Fei. Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora. In CVPR, 2010 • E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng. Improving Word Representations via Global Context and Multiple Word Prototypes. In ACL, 2012
  • 30. Bibliography • A. Coates and A. Ng. The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization. In ICML, 2011. • Coates, Adam, Lee, Honlak, and Ng, Andrew Y. An analysis of single-layer networks in unsupervised feature learning. In International Conference on AI and Statistics, 2011. • A. Torralba R. Salakhutdinov, J. Tenenbaum. Learning to learn with compound hierarchical-deep models. In NIPS, 2012. • J. Weston, S. Bengio1, and N. Usunier. Large Scale Image Annotation: Learning to Rank with Joint Word-Image Embeddings. In Machine Learning, 81 (1):21-35, 2010