SlideShare a Scribd company logo
1 of 20
Describing Images using Inferred
Visual Dependency Representations
BY DESMOND ELLIOTT & ARJEN P. DE VRIES
PRESENTED BY
P.R.P.L.S KUMARA – 158224B
1
Overview
 What is VDR
 Objectives of the Paper
 Related Works & Their Limitations
 Methodology
 Linguistic Processing
 Visual Processing
 Building a Language Model
 Generating Descriptions
 Evaluation
 Models
 Evaluation Measures
 Data Sets
 Results
 Conclusion
2
What is VDR
 Structured representation of an image that explicitly models the spatial
relationships between objects
 Spatial relationship between a pair of objects is encoded with one of the
following eight options:
 Above
 Below
 Beside
 Opposite
 On
 Surrounds,
 In front
 Behind
3
Objectives of the Paper
 Training a VDR Parsing Model without the extensive human supervision
 Approach is to find the objects mentioned in a given description using a
state-of-the-art object detector, and to use successful detections to
produce training data.
4
Objectives of the Paper
 Generating descriptions for unseen images from VDR
 The description of an unseen image is produced by first predicting its VDR over
automatically detected objects, and then generating the text with a template-based
generation model using the predicted VDR
5
Related Works & Their Limitations
 Approaches
 Models Rely on Spatial Relationship
 Corpus-based relationships
 Spatial and visual attributes
 n-gram phrase fusion from Web-scale corpora
 Recurrent neural networks
 Limitation
 VDR reliance on gold-standard training annotations & which require training
annotators
6
Methodology
 Inferred VDR constructed by searching the subject and object referred in the image
description using an object detector.
 VDR is created by attaching the detected subject to the detected object
 The spatial relationships applied between subjects and objects are defined as follows
7
Methodology
 Linguistic Processing
 Description of an image is processed to extract candidates for the mentioned objects
 Candidates from the nsubj and dobj tokens in the dependency parsed description
 Example is discarded, If the parsed description does not contain both a subject and an object
8
Methodology
 Visual Processing
 Attempt to find candidate objects in the image using the Regions with Convolutional Neural Network
features object detector
 Output of the object detector is a bounding box with real-valued confidence scores
 Increase the chance of finding objects by lemmatizing the token, and transforming the token
into its WordNet hypernym parent
9
Methodology
 Collection of automatically inferred VDR
10
Methodology
 Building a Language Model
 Extract the top-N objects from an image using an object detector
 Predict the spatial relationships between the objects using a VDR Parser
 Descriptions are generated for all parent–child subtrees in the VDR
 Final text has the highest combined corpus and visual confidence
11
Methodology
 Generating Descriptions
 Using a template-based language generation model
 Descriptions are generated using the following template:
DT head is V DT child
 Above labels of the objects that appear in the head and child positions of a specific VDR subtree
 Model captures statistics about
 Nouns that appear as subjects and objects
 The verbs between them
 Spatial relationships observed in the inferred training VDRs
12
Evaluation
 Generate a natural language description of an image, which is evaluated directly against
multiple reference descriptions
 Models
 Compare against following image description models
 MIDGE (text based on tree-substitution grammar and relies on discrete object detections for visual input)
 BRNN (multimodal deep neural network that generates descriptions directly from vector representations of the
image and the description)
13
Evaluation
 Evaluation Measures
 Evaluate the generated descriptions using sentence-level Meteor and BLEU4
 They adopt a jack-knifing evaluation methodology, which enables to report human–human results
 Data Set
 Pascal1K
 Contains 1,000 images sampled from the PASCAL Object Detection Challenge data set each image is paired with
five reference descriptions collected from Mechanical Turk. It contains a wide variety of subject matter
 VLT2K
 contains 2,424 images each image is paired with three reference descriptions, also collected from Mechanical Turk
 Split the images into 80% training, 10% validation, and 10% test
14
Evaluation
 Results
15
Evaluation 16
Evaluation 17
Evaluation
18
 Optimizing the number of detected objects against generated description Meteor scores
 Improvements are seen until eight objects
 Good descriptions do not always need the most confident detections
Conclusion
 Show visual Dependency Representations of images without expensive human
supervision
 Quality of the generated text largely depended on the data set
 Quality of the descriptions depends on whether the images depict an action
 Encoding the spatial relationships between objects is a useful way of learning how to
describe actions
 Future improvements with broader coverage object detectors
 Relax the strict mirroring of human annotator behavior when searching for subjects
and objects in an image
 n-gram based language model constrained by the structured predicted in VDR
19
Thank You…!!!
20

More Related Content

What's hot

An Empirical Study for Defect Prediction using Clustering
An Empirical Study for Defect Prediction using ClusteringAn Empirical Study for Defect Prediction using Clustering
An Empirical Study for Defect Prediction using Clusteringidescitation
 
Designing Hybrid Cryptosystem for Secure Transmission of Image Data using Bio...
Designing Hybrid Cryptosystem for Secure Transmission of Image Data using Bio...Designing Hybrid Cryptosystem for Secure Transmission of Image Data using Bio...
Designing Hybrid Cryptosystem for Secure Transmission of Image Data using Bio...ranjit banshpal
 
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...A Novel Classification via Clustering Method for Anomaly Based Network Intrus...
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...IDES Editor
 
Assessment of Programming Language Reliability Utilizing Soft-Computing
Assessment of Programming Language Reliability Utilizing Soft-ComputingAssessment of Programming Language Reliability Utilizing Soft-Computing
Assessment of Programming Language Reliability Utilizing Soft-Computingijcsa
 
A Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionA Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionIJCSIS Research Publications
 
AN EFFECTIVE SEMANTIC ENCRYPTED RELATIONAL DATA USING K-NN MODEL
AN EFFECTIVE SEMANTIC ENCRYPTED RELATIONAL DATA USING K-NN MODELAN EFFECTIVE SEMANTIC ENCRYPTED RELATIONAL DATA USING K-NN MODEL
AN EFFECTIVE SEMANTIC ENCRYPTED RELATIONAL DATA USING K-NN MODELijsptm
 
Show observe and tell giang nguyen
Show observe and tell   giang nguyenShow observe and tell   giang nguyen
Show observe and tell giang nguyenNguyen Giang
 
A SECURE STEGANOGRAPHY APPROACH FOR CLOUD DATA USING ANN ALONG WITH PRIVATE K...
A SECURE STEGANOGRAPHY APPROACH FOR CLOUD DATA USING ANN ALONG WITH PRIVATE K...A SECURE STEGANOGRAPHY APPROACH FOR CLOUD DATA USING ANN ALONG WITH PRIVATE K...
A SECURE STEGANOGRAPHY APPROACH FOR CLOUD DATA USING ANN ALONG WITH PRIVATE K...IJCSIS Research Publications
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
 
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Editor IJMTER
 
Handwritten Digit Recognition
Handwritten Digit RecognitionHandwritten Digit Recognition
Handwritten Digit Recognitionijtsrd
 
Identification of Relevant Sections in Web Pages Using a Machine Learning App...
Identification of Relevant Sections in Web Pages Using a Machine Learning App...Identification of Relevant Sections in Web Pages Using a Machine Learning App...
Identification of Relevant Sections in Web Pages Using a Machine Learning App...Jerrin George
 
Cat and dog classification
Cat and dog classificationCat and dog classification
Cat and dog classificationomaraldabash
 

What's hot (20)

An Empirical Study for Defect Prediction using Clustering
An Empirical Study for Defect Prediction using ClusteringAn Empirical Study for Defect Prediction using Clustering
An Empirical Study for Defect Prediction using Clustering
 
Designing Hybrid Cryptosystem for Secure Transmission of Image Data using Bio...
Designing Hybrid Cryptosystem for Secure Transmission of Image Data using Bio...Designing Hybrid Cryptosystem for Secure Transmission of Image Data using Bio...
Designing Hybrid Cryptosystem for Secure Transmission of Image Data using Bio...
 
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...A Novel Classification via Clustering Method for Anomaly Based Network Intrus...
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...
 
JOSA TechTalks - Machine Learning in Practice
JOSA TechTalks - Machine Learning in PracticeJOSA TechTalks - Machine Learning in Practice
JOSA TechTalks - Machine Learning in Practice
 
Assessment of Programming Language Reliability Utilizing Soft-Computing
Assessment of Programming Language Reliability Utilizing Soft-ComputingAssessment of Programming Language Reliability Utilizing Soft-Computing
Assessment of Programming Language Reliability Utilizing Soft-Computing
 
Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 
A Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionA Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware Detection
 
Deep learning ppt
Deep learning pptDeep learning ppt
Deep learning ppt
 
AN EFFECTIVE SEMANTIC ENCRYPTED RELATIONAL DATA USING K-NN MODEL
AN EFFECTIVE SEMANTIC ENCRYPTED RELATIONAL DATA USING K-NN MODELAN EFFECTIVE SEMANTIC ENCRYPTED RELATIONAL DATA USING K-NN MODEL
AN EFFECTIVE SEMANTIC ENCRYPTED RELATIONAL DATA USING K-NN MODEL
 
Show observe and tell giang nguyen
Show observe and tell   giang nguyenShow observe and tell   giang nguyen
Show observe and tell giang nguyen
 
A SECURE STEGANOGRAPHY APPROACH FOR CLOUD DATA USING ANN ALONG WITH PRIVATE K...
A SECURE STEGANOGRAPHY APPROACH FOR CLOUD DATA USING ANN ALONG WITH PRIVATE K...A SECURE STEGANOGRAPHY APPROACH FOR CLOUD DATA USING ANN ALONG WITH PRIVATE K...
A SECURE STEGANOGRAPHY APPROACH FOR CLOUD DATA USING ANN ALONG WITH PRIVATE K...
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
 
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
 
Handwritten Digit Recognition
Handwritten Digit RecognitionHandwritten Digit Recognition
Handwritten Digit Recognition
 
Jz3118501853
Jz3118501853Jz3118501853
Jz3118501853
 
Applications of Machine Learning
Applications of Machine LearningApplications of Machine Learning
Applications of Machine Learning
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
Identification of Relevant Sections in Web Pages Using a Machine Learning App...
Identification of Relevant Sections in Web Pages Using a Machine Learning App...Identification of Relevant Sections in Web Pages Using a Machine Learning App...
Identification of Relevant Sections in Web Pages Using a Machine Learning App...
 
G44093135
G44093135G44093135
G44093135
 
Cat and dog classification
Cat and dog classificationCat and dog classification
Cat and dog classification
 

Viewers also liked

Visual Representation
Visual RepresentationVisual Representation
Visual Representationkbrunclik
 
Sketch ins- a tel design technique
Sketch ins- a tel design techniqueSketch ins- a tel design technique
Sketch ins- a tel design techniqueBrock Craft
 
Notes on visual representation
Notes on visual representationNotes on visual representation
Notes on visual representationBrock Craft
 
Educational Technology Graphic/Audio Visual Materials
Educational Technology Graphic/Audio Visual MaterialsEducational Technology Graphic/Audio Visual Materials
Educational Technology Graphic/Audio Visual MaterialsAvigail Gabaleo Maximo
 
Graphical Representation of data
Graphical Representation of dataGraphical Representation of data
Graphical Representation of dataJijo K Mathew
 

Viewers also liked (6)

Visual Representation
Visual RepresentationVisual Representation
Visual Representation
 
Sketch ins- a tel design technique
Sketch ins- a tel design techniqueSketch ins- a tel design technique
Sketch ins- a tel design technique
 
Notes on visual representation
Notes on visual representationNotes on visual representation
Notes on visual representation
 
Educational Technology Graphic/Audio Visual Materials
Educational Technology Graphic/Audio Visual MaterialsEducational Technology Graphic/Audio Visual Materials
Educational Technology Graphic/Audio Visual Materials
 
Representation
RepresentationRepresentation
Representation
 
Graphical Representation of data
Graphical Representation of dataGraphical Representation of data
Graphical Representation of data
 

Similar to Describing Images using Visual Dependency Representation

Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural NetworksYogendra Tamang
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - Hiroshi Fukui
 
imageclassification-160206090009.pdf
imageclassification-160206090009.pdfimageclassification-160206090009.pdf
imageclassification-160206090009.pdfKammetaJoshna
 
Ijarcet vol-2-issue-4-1374-1382
Ijarcet vol-2-issue-4-1374-1382Ijarcet vol-2-issue-4-1374-1382
Ijarcet vol-2-issue-4-1374-1382Editor IJARCET
 
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTMDEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTMIRJET Journal
 
Object Detection on Dental X-ray Images using R-CNN
Object Detection on Dental X-ray Images using R-CNNObject Detection on Dental X-ray Images using R-CNN
Object Detection on Dental X-ray Images using R-CNNMinhazul Arefin
 
Modelling Framework of a Neural Object Recognition
Modelling Framework of a Neural Object RecognitionModelling Framework of a Neural Object Recognition
Modelling Framework of a Neural Object RecognitionIJERA Editor
 
Deep Learning for X ray Image to Text Generation
Deep Learning for X ray Image to Text GenerationDeep Learning for X ray Image to Text Generation
Deep Learning for X ray Image to Text Generationijtsrd
 
Scene recognition using Convolutional Neural Network
Scene recognition using Convolutional Neural NetworkScene recognition using Convolutional Neural Network
Scene recognition using Convolutional Neural NetworkDhirajGidde
 
Image Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine LearningImage Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine Learningijtsrd
 
IRJET - Visual Question Answering – Implementation using Keras
IRJET -  	  Visual Question Answering – Implementation using KerasIRJET -  	  Visual Question Answering – Implementation using Keras
IRJET - Visual Question Answering – Implementation using KerasIRJET Journal
 
Object Recogniton Based on Undecimated Wavelet Transform
Object Recogniton Based on Undecimated Wavelet TransformObject Recogniton Based on Undecimated Wavelet Transform
Object Recogniton Based on Undecimated Wavelet TransformIJCOAiir
 
Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...
Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...
Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...Universitat Politècnica de Catalunya
 
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNINGATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNINGNathan Mathis
 
OBJECT DETECTION AND RECOGNITION: A SURVEY
OBJECT DETECTION AND RECOGNITION: A SURVEYOBJECT DETECTION AND RECOGNITION: A SURVEY
OBJECT DETECTION AND RECOGNITION: A SURVEYJournal For Research
 
Hierarchical deep learning architecture for 10 k objects classification
Hierarchical deep learning architecture for 10 k objects classificationHierarchical deep learning architecture for 10 k objects classification
Hierarchical deep learning architecture for 10 k objects classificationcsandit
 
HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION
HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATIONHIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION
HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATIONcscpconf
 
fuzzy LBP for face recognition ppt
fuzzy LBP for face recognition pptfuzzy LBP for face recognition ppt
fuzzy LBP for face recognition pptAbdullah Gubbi
 
Stanford DeepDive Framework
Stanford DeepDive FrameworkStanford DeepDive Framework
Stanford DeepDive FrameworkRan Zhang
 

Similar to Describing Images using Visual Dependency Representation (20)

Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
 
imageclassification-160206090009.pdf
imageclassification-160206090009.pdfimageclassification-160206090009.pdf
imageclassification-160206090009.pdf
 
Ijarcet vol-2-issue-4-1374-1382
Ijarcet vol-2-issue-4-1374-1382Ijarcet vol-2-issue-4-1374-1382
Ijarcet vol-2-issue-4-1374-1382
 
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTMDEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
 
Object Detection on Dental X-ray Images using R-CNN
Object Detection on Dental X-ray Images using R-CNNObject Detection on Dental X-ray Images using R-CNN
Object Detection on Dental X-ray Images using R-CNN
 
Modelling Framework of a Neural Object Recognition
Modelling Framework of a Neural Object RecognitionModelling Framework of a Neural Object Recognition
Modelling Framework of a Neural Object Recognition
 
Deep Learning for X ray Image to Text Generation
Deep Learning for X ray Image to Text GenerationDeep Learning for X ray Image to Text Generation
Deep Learning for X ray Image to Text Generation
 
MultiModal Retrieval Image
MultiModal Retrieval ImageMultiModal Retrieval Image
MultiModal Retrieval Image
 
Scene recognition using Convolutional Neural Network
Scene recognition using Convolutional Neural NetworkScene recognition using Convolutional Neural Network
Scene recognition using Convolutional Neural Network
 
Image Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine LearningImage Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine Learning
 
IRJET - Visual Question Answering – Implementation using Keras
IRJET -  	  Visual Question Answering – Implementation using KerasIRJET -  	  Visual Question Answering – Implementation using Keras
IRJET - Visual Question Answering – Implementation using Keras
 
Object Recogniton Based on Undecimated Wavelet Transform
Object Recogniton Based on Undecimated Wavelet TransformObject Recogniton Based on Undecimated Wavelet Transform
Object Recogniton Based on Undecimated Wavelet Transform
 
Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...
Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...
Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...
 
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNINGATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
 
OBJECT DETECTION AND RECOGNITION: A SURVEY
OBJECT DETECTION AND RECOGNITION: A SURVEYOBJECT DETECTION AND RECOGNITION: A SURVEY
OBJECT DETECTION AND RECOGNITION: A SURVEY
 
Hierarchical deep learning architecture for 10 k objects classification
Hierarchical deep learning architecture for 10 k objects classificationHierarchical deep learning architecture for 10 k objects classification
Hierarchical deep learning architecture for 10 k objects classification
 
HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION
HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATIONHIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION
HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION
 
fuzzy LBP for face recognition ppt
fuzzy LBP for face recognition pptfuzzy LBP for face recognition ppt
fuzzy LBP for face recognition ppt
 
Stanford DeepDive Framework
Stanford DeepDive FrameworkStanford DeepDive Framework
Stanford DeepDive Framework
 

Recently uploaded

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 

Recently uploaded (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Describing Images using Visual Dependency Representation

  • 1. Describing Images using Inferred Visual Dependency Representations BY DESMOND ELLIOTT & ARJEN P. DE VRIES PRESENTED BY P.R.P.L.S KUMARA – 158224B 1
  • 2. Overview  What is VDR  Objectives of the Paper  Related Works & Their Limitations  Methodology  Linguistic Processing  Visual Processing  Building a Language Model  Generating Descriptions  Evaluation  Models  Evaluation Measures  Data Sets  Results  Conclusion 2
  • 3. What is VDR  Structured representation of an image that explicitly models the spatial relationships between objects  Spatial relationship between a pair of objects is encoded with one of the following eight options:  Above  Below  Beside  Opposite  On  Surrounds,  In front  Behind 3
  • 4. Objectives of the Paper  Training a VDR Parsing Model without the extensive human supervision  Approach is to find the objects mentioned in a given description using a state-of-the-art object detector, and to use successful detections to produce training data. 4
  • 5. Objectives of the Paper  Generating descriptions for unseen images from VDR  The description of an unseen image is produced by first predicting its VDR over automatically detected objects, and then generating the text with a template-based generation model using the predicted VDR 5
  • 6. Related Works & Their Limitations  Approaches  Models Rely on Spatial Relationship  Corpus-based relationships  Spatial and visual attributes  n-gram phrase fusion from Web-scale corpora  Recurrent neural networks  Limitation  VDR reliance on gold-standard training annotations & which require training annotators 6
  • 7. Methodology  Inferred VDR constructed by searching the subject and object referred in the image description using an object detector.  VDR is created by attaching the detected subject to the detected object  The spatial relationships applied between subjects and objects are defined as follows 7
  • 8. Methodology  Linguistic Processing  Description of an image is processed to extract candidates for the mentioned objects  Candidates from the nsubj and dobj tokens in the dependency parsed description  Example is discarded, If the parsed description does not contain both a subject and an object 8
  • 9. Methodology  Visual Processing  Attempt to find candidate objects in the image using the Regions with Convolutional Neural Network features object detector  Output of the object detector is a bounding box with real-valued confidence scores  Increase the chance of finding objects by lemmatizing the token, and transforming the token into its WordNet hypernym parent 9
  • 10. Methodology  Collection of automatically inferred VDR 10
  • 11. Methodology  Building a Language Model  Extract the top-N objects from an image using an object detector  Predict the spatial relationships between the objects using a VDR Parser  Descriptions are generated for all parent–child subtrees in the VDR  Final text has the highest combined corpus and visual confidence 11
  • 12. Methodology  Generating Descriptions  Using a template-based language generation model  Descriptions are generated using the following template: DT head is V DT child  Above labels of the objects that appear in the head and child positions of a specific VDR subtree  Model captures statistics about  Nouns that appear as subjects and objects  The verbs between them  Spatial relationships observed in the inferred training VDRs 12
  • 13. Evaluation  Generate a natural language description of an image, which is evaluated directly against multiple reference descriptions  Models  Compare against following image description models  MIDGE (text based on tree-substitution grammar and relies on discrete object detections for visual input)  BRNN (multimodal deep neural network that generates descriptions directly from vector representations of the image and the description) 13
  • 14. Evaluation  Evaluation Measures  Evaluate the generated descriptions using sentence-level Meteor and BLEU4  They adopt a jack-knifing evaluation methodology, which enables to report human–human results  Data Set  Pascal1K  Contains 1,000 images sampled from the PASCAL Object Detection Challenge data set each image is paired with five reference descriptions collected from Mechanical Turk. It contains a wide variety of subject matter  VLT2K  contains 2,424 images each image is paired with three reference descriptions, also collected from Mechanical Turk  Split the images into 80% training, 10% validation, and 10% test 14
  • 18. Evaluation 18  Optimizing the number of detected objects against generated description Meteor scores  Improvements are seen until eight objects  Good descriptions do not always need the most confident detections
  • 19. Conclusion  Show visual Dependency Representations of images without expensive human supervision  Quality of the generated text largely depended on the data set  Quality of the descriptions depends on whether the images depict an action  Encoding the spatial relationships between objects is a useful way of learning how to describe actions  Future improvements with broader coverage object detectors  Relax the strict mirroring of human annotator behavior when searching for subjects and objects in an image  n-gram based language model constrained by the structured predicted in VDR 19