SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
Visual Object Category
Recognition
Ashish Gupta
Centre for Vision, Speech, and Signal Processing
Contents
• Introduction
• Related work
• Overview: Object recognition system
• Object classification & detection
• Conclusions
• Future work
Introduction
Research Topic: Visual object category recognition using
weakly supervised learning.
DIPLECS: Artificial cognitive system for autonomous systems.
• Interested in object interactions determined by
their functional properties.
• All objects in same category have the same
functional properties.
• Recognition is based on object’s visual
properties.
Introduction
Research Topic: Visual object category recognition using
weakly supervised learning.
• A very large training set is required to learn the
large appearance variation in a category.
• So we utilize huge image datasets like Flickr®
and GoogleTM Image.
• The images are corrupt and incompletely
labelled.
• Therefore, weakly supervised learning is
utilized which can handle corrupt and noisy
training data.
Challenges
Intra-category
appearance
Pose Clutter Scale
Occlusion Illumination Articulation Camouflage
Background
Work done
Visual Recognition System
SIFT feature descriptor
Occurrence frequency of visual words is characteristic of the object
Object model : bag-of-visual words
Creating a visual codebook
Object model : bag-of-visual words
A test image can be classified
based on the distance of its
normalized codebook from the
codebooks of positive and negative
training samples.
Codebook positive samples Codebook negative samples Codebook test image
Object model : bag-of-visual words
Visual codebooks for positive and negative samples of ‘car’ category in
PASCAL VOC 2006
Object model : bag-of-visual words
Visual codebooks for ‘car’ and ‘cow’ categories in PASCAL VOC 2009 dataset
Classification
ROC (Receiver Operating
Characteristics): evaluating
classification performance.
ROC for ‘car’ category in
PASCAL VOC 2006
The linear kernel:
K(x,y) = xTy, was used
since it is fast.
Improve Classification
Larger Visual Codebook:
• More representative of category
• Higher computational cost
ROC of ‘car’ category in PACAL VOC
2006 for codebook sizes from 20 to
20000 visual words.
Improve Classification
Improve Classification
Training and test images in the
dataset scaled down by same factor.
Training and test images scaled down by
different factors.
Improve Classification
Training Samples Dataset 1 Training Samples Dataset 2Scale down
factor
/1
/2
Y N
Y Y
Test Image Image classified correctly
Improve Classification
ROC for 20 visual categories in
PASCAL VOC 2009
The PACAL VOC 2009 dataset is
larger and more challenging than the
2006 dataset.
Improve Classification
ROC for PASCAL VOC 2009 training
and test images images scaled down
by factor of 2
ROC for PASCAL VOC 2009 using a
universal visual vocabulary
Object localization using sliding window
The poor localization results are due to:
• Lack of structural information in the
bag-of-words object model
• Classifier learning object background
Visual codebook
Training images with
bounding - boxes
Training images without
bounding - boxes
Good Codebook with equal population of
positive and negative visual words
Positive background different
from negative images
Positive background similar to
negative images
With no bounding-box
utilized, the codebook
consists of a majority of
negative visual words.
Visual codebook
Training images with
bounding - boxes
Training images without
bounding - boxes
Good Codebook with equal population of
positive and negative visual words
Positive background different
from negative images
Positive background similar to
negative images
Classification based on
object context
(background) rather than
object features.
Improve Classification
The detection at each iteration estimates a bounding box which provides a better
visual codebook which in turn leads to better detection.
• Key-point configurations as
features are a discriminative
object feature set.
• A configuration of visual words
appends structural information
to the bag-of-words model.
Object detection
• Harvest frequent and discriminative configurations.
• Encode configurations called transaction vectors.
• Association between a transaction vector and the
training type is an association rule.
• Apriori algorithm finds association rules with high
confidence in a support-confidence framework.
Transaction vector encoding
key-point configuration
Apriori algorithm
• Uses breadth-first search and tree structure.
• Longer configurations will have lower support as
they are infrequent but higher confidence as they
are more discriminative.
• Downward closure lemma: prune configurations
with infrequent sub-sets.
Object localization
Training
Data Set
Test Data
Set
Test Image
Generate
Transactions Transactions
Apriori data
mining
Association
Rules
Generate Confidence
for each Transaction
Threshold
Confidence
Transactions
• A confidence is assigned to every
key-point in the image.
• Key-points with sufficiently high
confidence are retained.
• Key-points which occur on
common background objects like
doors and windows can have high
confidence.
Object classification using Apriori
Training
Data Set
Test Data
Set
Generate
Transactions Transactions
Apriori data
mining
Association
Rules
Generate Confidence
for each Transaction
Sum
Confidence
TransactionsTest
Images
ROC ‘car’ in PASCAL VOC 2006
The summed confidence score depends
upon object scale in the image, which
explains the comparatively poor
performance of this approach.
Conclusions
• The ‘bag-of-words’ model is good for classification, but poor for localization.
• Separate foreground-background for better visual codebooks.
• The good classification using PASCAL VOC 2006 dataset is attributed to
recognition of object context rather than object features.
• The dataset utilized should have sufficient variation in appearance of the
object and its background.
• Larger visual vocabulary gives slightly better classification, but is
computationally more expensive.
• The visual vocabulary built has majority of background visual words since
bounding-boxes are not utilized during training.
Conclusions
• Improving the proportion of visual words representing the object in the
vocabulary is vital for good classification.
• Incorporate object boundary contour to the descriptor.
• Use of frequent and discriminative key-point configurations is a promising
approach for object localization.
• A low quality dataset results in a weak visual codebook and classifiers biased
to the training data.
• Classification using key-point configurations was poor compared to ‘bag-of-
words’ for PASCAL VOC 2006.
Future Work
• Improve a visual codebook by increasing the proportion of visual words
pertaining to object features. Combine Apriori based localization and
clustering for visual word selection in an iterative approach.
•Model visual scene information (Use the GIST descriptor by Torralba). Learn
co-occurrence statistics of a scene and a visual category. Recognition of the
scene serves as prior for object presence and improves object recognition
performance.
• Improve object localization by using context priming.
• Model object contextual information to aid foreground-background
disambiguation for better object localization.
Future Work
• Share information of features between visual categories. The size of a
universal visual vocabulary should increase sub-linearly with increase in
number of visual categories.
• Combine image segmentation and classification to improve the object
model to provide better classification performance.
• Build a hierarchical framework for visual categorization:
• Representation: combine local and global features.
• Model: combine semantic and structural object models.
• Classification: combine generative and discriminative approaches.
Future Work
Questions?

Contenu connexe

Tendances

"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
Edge AI and Vision Alliance
 

Tendances (20)

Object Recognition
Object RecognitionObject Recognition
Object Recognition
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
CNN and its applications by ketaki
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketaki
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
You Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object DetectionYou Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object Detection
 
[Mmlab seminar 2016] deep learning for human pose estimation
[Mmlab seminar 2016] deep learning for human pose estimation[Mmlab seminar 2016] deep learning for human pose estimation
[Mmlab seminar 2016] deep learning for human pose estimation
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural Networks
 
Mobilenetv1 v2 slide
Mobilenetv1 v2 slideMobilenetv1 v2 slide
Mobilenetv1 v2 slide
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition
 
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detection
 
Human pose estimation with deep learning
Human pose estimation with deep learningHuman pose estimation with deep learning
Human pose estimation with deep learning
 
Image-to-Image Translation pix2pix
Image-to-Image Translation pix2pixImage-to-Image Translation pix2pix
Image-to-Image Translation pix2pix
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 

En vedette

En vedette (20)

Selayang pandang ygrex
Selayang pandang ygrexSelayang pandang ygrex
Selayang pandang ygrex
 
12. jaka putra implementasi histogram equalization untuk perbaikan noise pad...
12. jaka putra  implementasi histogram equalization untuk perbaikan noise pad...12. jaka putra  implementasi histogram equalization untuk perbaikan noise pad...
12. jaka putra implementasi histogram equalization untuk perbaikan noise pad...
 
Pengertian field, record, table, file, data dan basis data lengkap pengerti...
Pengertian field, record, table, file, data dan basis data lengkap   pengerti...Pengertian field, record, table, file, data dan basis data lengkap   pengerti...
Pengertian field, record, table, file, data dan basis data lengkap pengerti...
 
Using kalman filter for object tracking matlab & simulink example
Using kalman filter for object tracking   matlab & simulink exampleUsing kalman filter for object tracking   matlab & simulink example
Using kalman filter for object tracking matlab & simulink example
 
Jurnal 15235 pengelompokan kayu kelapa menggunakan algoritma k-means
Jurnal 15235 pengelompokan kayu kelapa menggunakan algoritma k-meansJurnal 15235 pengelompokan kayu kelapa menggunakan algoritma k-means
Jurnal 15235 pengelompokan kayu kelapa menggunakan algoritma k-means
 
4. jurnal budi pradana implementasi metode low pass filtering untuk mereduks...
4. jurnal budi pradana  implementasi metode low pass filtering untuk mereduks...4. jurnal budi pradana  implementasi metode low pass filtering untuk mereduks...
4. jurnal budi pradana implementasi metode low pass filtering untuk mereduks...
 
Petunjuk sikronisasi unbk_smk_dan_sma
Petunjuk sikronisasi unbk_smk_dan_smaPetunjuk sikronisasi unbk_smk_dan_sma
Petunjuk sikronisasi unbk_smk_dan_sma
 
Foreground detection using gaussian mixture models matlab
Foreground detection using gaussian mixture models   matlabForeground detection using gaussian mixture models   matlab
Foreground detection using gaussian mixture models matlab
 
6. monika sianipar (1011493) perancangan aplikasi forecasting persediaan baha...
6. monika sianipar (1011493) perancangan aplikasi forecasting persediaan baha...6. monika sianipar (1011493) perancangan aplikasi forecasting persediaan baha...
6. monika sianipar (1011493) perancangan aplikasi forecasting persediaan baha...
 
Jurnal 15398 ilmplementasi k-nearest neighbor untuk mengenali pola citra dala...
Jurnal 15398 ilmplementasi k-nearest neighbor untuk mengenali pola citra dala...Jurnal 15398 ilmplementasi k-nearest neighbor untuk mengenali pola citra dala...
Jurnal 15398 ilmplementasi k-nearest neighbor untuk mengenali pola citra dala...
 
16. afrisawati implementasi data mining pemilihan pelanggan potensial menggu...
16. afrisawati  implementasi data mining pemilihan pelanggan potensial menggu...16. afrisawati  implementasi data mining pemilihan pelanggan potensial menggu...
16. afrisawati implementasi data mining pemilihan pelanggan potensial menggu...
 
Pengumuman unbk di ubk 2017
Pengumuman unbk di ubk 2017Pengumuman unbk di ubk 2017
Pengumuman unbk di ubk 2017
 
2. jurnal dessy purwandani implementasi metode gaussian smoothing untuk peng...
2. jurnal dessy purwandani  implementasi metode gaussian smoothing untuk peng...2. jurnal dessy purwandani  implementasi metode gaussian smoothing untuk peng...
2. jurnal dessy purwandani implementasi metode gaussian smoothing untuk peng...
 
Troubleshooting unbk 20170303
Troubleshooting unbk 20170303Troubleshooting unbk 20170303
Troubleshooting unbk 20170303
 
Tutorial zulfa mengenal menu home menggunakan keyboard
Tutorial zulfa mengenal menu home menggunakan keyboardTutorial zulfa mengenal menu home menggunakan keyboard
Tutorial zulfa mengenal menu home menggunakan keyboard
 
Selayang pandang kiki rusdyanto
Selayang pandang kiki rusdyantoSelayang pandang kiki rusdyanto
Selayang pandang kiki rusdyanto
 
20. implementasi data mining pada penjualan produk elektronik dengan algoritm...
20. implementasi data mining pada penjualan produk elektronik dengan algoritm...20. implementasi data mining pada penjualan produk elektronik dengan algoritm...
20. implementasi data mining pada penjualan produk elektronik dengan algoritm...
 
Clustering skripsi teknik informatikac
Clustering   skripsi teknik informatikacClustering   skripsi teknik informatikac
Clustering skripsi teknik informatikac
 
Pengolahan citra
Pengolahan citraPengolahan citra
Pengolahan citra
 
Sejarah bem stkip subang
Sejarah bem stkip subangSejarah bem stkip subang
Sejarah bem stkip subang
 

Similaire à Visual Object Category Recognition

RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
S. Diana Hu
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
Egyptian Engineers Association
 
Machine learning for natural language understanding
Machine learning for natural language understandingMachine learning for natural language understanding
Machine learning for natural language understanding
HaiderBukhari14
 

Similaire à Visual Object Category Recognition (20)

pre-defence.pptx
pre-defence.pptxpre-defence.pptx
pre-defence.pptx
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
SAP ABAP using OOPS - JH Softech
SAP ABAP using OOPS - JH SoftechSAP ABAP using OOPS - JH Softech
SAP ABAP using OOPS - JH Softech
 
BESDUI: Benchmark for End-User Structured Data User Interfaces
BESDUI: Benchmark for End-User Structured Data User InterfacesBESDUI: Benchmark for End-User Structured Data User Interfaces
BESDUI: Benchmark for End-User Structured Data User Interfaces
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at Southampton
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at Southampton
 
Video+Language: From Classification to Description
Video+Language: From Classification to DescriptionVideo+Language: From Classification to Description
Video+Language: From Classification to Description
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
Evolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchEvolving a Medical Image Similarity Search
Evolving a Medical Image Similarity Search
 
Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...
 
Intro to oop.pptx
Intro to oop.pptxIntro to oop.pptx
Intro to oop.pptx
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
 
Translating Models to Medicine an Example of Managing Visual Communications
Translating Models to Medicine an Example of Managing Visual CommunicationsTranslating Models to Medicine an Example of Managing Visual Communications
Translating Models to Medicine an Example of Managing Visual Communications
 
Entity linking in advertisements
Entity linking in advertisementsEntity linking in advertisements
Entity linking in advertisements
 
Machine learning for natural language understanding
Machine learning for natural language understandingMachine learning for natural language understanding
Machine learning for natural language understanding
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at Southampton
 
Naïve multi label classification of you tube comments using
Naïve multi label classification of you tube comments usingNaïve multi label classification of you tube comments using
Naïve multi label classification of you tube comments using
 

Plus de Ashish Gupta

Learning a structured model for visual category recognition
Learning a structured model for visual category recognitionLearning a structured model for visual category recognition
Learning a structured model for visual category recognition
Ashish Gupta
 
Visual Category Recognition using Information-Theoretic Co-Clustering
Visual Category Recognition using Information-Theoretic Co-ClusteringVisual Category Recognition using Information-Theoretic Co-Clustering
Visual Category Recognition using Information-Theoretic Co-Clustering
Ashish Gupta
 

Plus de Ashish Gupta (6)

GreenR: Automatic Plant Disease Diagnosis
GreenR: Automatic Plant Disease DiagnosisGreenR: Automatic Plant Disease Diagnosis
GreenR: Automatic Plant Disease Diagnosis
 
Learning a structured model for visual category recognition
Learning a structured model for visual category recognitionLearning a structured model for visual category recognition
Learning a structured model for visual category recognition
 
Visual Category Recognition using Information-Theoretic Co-Clustering
Visual Category Recognition using Information-Theoretic Co-ClusteringVisual Category Recognition using Information-Theoretic Co-Clustering
Visual Category Recognition using Information-Theoretic Co-Clustering
 
Fuzzy Encoding For Image Classification Using Gustafson-Kessel Aglorithm
Fuzzy Encoding For Image Classification Using Gustafson-Kessel AglorithmFuzzy Encoding For Image Classification Using Gustafson-Kessel Aglorithm
Fuzzy Encoding For Image Classification Using Gustafson-Kessel Aglorithm
 
Semantically Relevant Visual Dictionary
Semantically Relevant Visual DictionarySemantically Relevant Visual Dictionary
Semantically Relevant Visual Dictionary
 
Towards Learning a Semantically Relevant Dictionary for Visual Category Recog...
Towards Learning a Semantically Relevant Dictionary for Visual Category Recog...Towards Learning a Semantically Relevant Dictionary for Visual Category Recog...
Towards Learning a Semantically Relevant Dictionary for Visual Category Recog...
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Visual Object Category Recognition

  • 1. Visual Object Category Recognition Ashish Gupta Centre for Vision, Speech, and Signal Processing
  • 2. Contents • Introduction • Related work • Overview: Object recognition system • Object classification & detection • Conclusions • Future work
  • 3. Introduction Research Topic: Visual object category recognition using weakly supervised learning. DIPLECS: Artificial cognitive system for autonomous systems. • Interested in object interactions determined by their functional properties. • All objects in same category have the same functional properties. • Recognition is based on object’s visual properties.
  • 4. Introduction Research Topic: Visual object category recognition using weakly supervised learning. • A very large training set is required to learn the large appearance variation in a category. • So we utilize huge image datasets like Flickr® and GoogleTM Image. • The images are corrupt and incompletely labelled. • Therefore, weakly supervised learning is utilized which can handle corrupt and noisy training data.
  • 10. Occurrence frequency of visual words is characteristic of the object Object model : bag-of-visual words Creating a visual codebook
  • 11. Object model : bag-of-visual words A test image can be classified based on the distance of its normalized codebook from the codebooks of positive and negative training samples. Codebook positive samples Codebook negative samples Codebook test image
  • 12. Object model : bag-of-visual words Visual codebooks for positive and negative samples of ‘car’ category in PASCAL VOC 2006
  • 13. Object model : bag-of-visual words Visual codebooks for ‘car’ and ‘cow’ categories in PASCAL VOC 2009 dataset
  • 14. Classification ROC (Receiver Operating Characteristics): evaluating classification performance. ROC for ‘car’ category in PASCAL VOC 2006 The linear kernel: K(x,y) = xTy, was used since it is fast.
  • 15. Improve Classification Larger Visual Codebook: • More representative of category • Higher computational cost ROC of ‘car’ category in PACAL VOC 2006 for codebook sizes from 20 to 20000 visual words.
  • 17. Improve Classification Training and test images in the dataset scaled down by same factor. Training and test images scaled down by different factors.
  • 18. Improve Classification Training Samples Dataset 1 Training Samples Dataset 2Scale down factor /1 /2 Y N Y Y Test Image Image classified correctly
  • 19. Improve Classification ROC for 20 visual categories in PASCAL VOC 2009 The PACAL VOC 2009 dataset is larger and more challenging than the 2006 dataset.
  • 20. Improve Classification ROC for PASCAL VOC 2009 training and test images images scaled down by factor of 2 ROC for PASCAL VOC 2009 using a universal visual vocabulary
  • 21. Object localization using sliding window The poor localization results are due to: • Lack of structural information in the bag-of-words object model • Classifier learning object background
  • 22. Visual codebook Training images with bounding - boxes Training images without bounding - boxes Good Codebook with equal population of positive and negative visual words Positive background different from negative images Positive background similar to negative images With no bounding-box utilized, the codebook consists of a majority of negative visual words.
  • 23. Visual codebook Training images with bounding - boxes Training images without bounding - boxes Good Codebook with equal population of positive and negative visual words Positive background different from negative images Positive background similar to negative images Classification based on object context (background) rather than object features.
  • 24. Improve Classification The detection at each iteration estimates a bounding box which provides a better visual codebook which in turn leads to better detection.
  • 25. • Key-point configurations as features are a discriminative object feature set. • A configuration of visual words appends structural information to the bag-of-words model. Object detection • Harvest frequent and discriminative configurations. • Encode configurations called transaction vectors. • Association between a transaction vector and the training type is an association rule. • Apriori algorithm finds association rules with high confidence in a support-confidence framework. Transaction vector encoding key-point configuration
  • 26. Apriori algorithm • Uses breadth-first search and tree structure. • Longer configurations will have lower support as they are infrequent but higher confidence as they are more discriminative. • Downward closure lemma: prune configurations with infrequent sub-sets.
  • 27. Object localization Training Data Set Test Data Set Test Image Generate Transactions Transactions Apriori data mining Association Rules Generate Confidence for each Transaction Threshold Confidence Transactions • A confidence is assigned to every key-point in the image. • Key-points with sufficiently high confidence are retained. • Key-points which occur on common background objects like doors and windows can have high confidence.
  • 28. Object classification using Apriori Training Data Set Test Data Set Generate Transactions Transactions Apriori data mining Association Rules Generate Confidence for each Transaction Sum Confidence TransactionsTest Images ROC ‘car’ in PASCAL VOC 2006 The summed confidence score depends upon object scale in the image, which explains the comparatively poor performance of this approach.
  • 29. Conclusions • The ‘bag-of-words’ model is good for classification, but poor for localization. • Separate foreground-background for better visual codebooks. • The good classification using PASCAL VOC 2006 dataset is attributed to recognition of object context rather than object features. • The dataset utilized should have sufficient variation in appearance of the object and its background. • Larger visual vocabulary gives slightly better classification, but is computationally more expensive. • The visual vocabulary built has majority of background visual words since bounding-boxes are not utilized during training.
  • 30. Conclusions • Improving the proportion of visual words representing the object in the vocabulary is vital for good classification. • Incorporate object boundary contour to the descriptor. • Use of frequent and discriminative key-point configurations is a promising approach for object localization. • A low quality dataset results in a weak visual codebook and classifiers biased to the training data. • Classification using key-point configurations was poor compared to ‘bag-of- words’ for PASCAL VOC 2006.
  • 31. Future Work • Improve a visual codebook by increasing the proportion of visual words pertaining to object features. Combine Apriori based localization and clustering for visual word selection in an iterative approach. •Model visual scene information (Use the GIST descriptor by Torralba). Learn co-occurrence statistics of a scene and a visual category. Recognition of the scene serves as prior for object presence and improves object recognition performance. • Improve object localization by using context priming. • Model object contextual information to aid foreground-background disambiguation for better object localization.
  • 32. Future Work • Share information of features between visual categories. The size of a universal visual vocabulary should increase sub-linearly with increase in number of visual categories. • Combine image segmentation and classification to improve the object model to provide better classification performance. • Build a hierarchical framework for visual categorization: • Representation: combine local and global features. • Model: combine semantic and structural object models. • Classification: combine generative and discriminative approaches.