SlideShare une entreprise Scribd logo
1  sur  51
Large-Scale Object
Recognition
Presenter: 電機碩二 賴柏任
Date: 06.18.2015
Motivation
• People can recognize tens of thousands
of objects...
• How about computers?
2
3
"What does classifying more than
10,000 image categories tell us?”
tries to discuss this question
Deng, Jia, et al. "What does classifying more than 10,000 image categories tell
us?." Computer Vision–ECCV 2010. Springer Berlin Heidelberg, 2010. 71-84.
Datasets
• ImageNet10K
– 10184 categories, 9 million images
• ImageNet7K (7404 categories)
• ImageNet1K (1000 categories)
• Rand200{a,b,c} (200 categories)
• CalNet200 (200 categories)
• Ungulate183, Fungus134, Vehicle262
4
Algorithms
• GIST+NN
– kNN on L2 distance
• BOW + NN
– SIFT for BOW, kNN on L1 distance
• BOW + SVM
– # of SVM == # of categories (1-vs-all)
• SPM + SVM
– SIFT for SPM, 1-vs-all SVM
5
Computation time analysis
• BOW+SVM (ImageNet 10K)
– A 1-vs-all SVM classifier needs 1 hr
(2.66 GHz Intel Xeon)
– 16 hrs for testing
• 66 multi-core machine needs several
weeks
6
Distributed computing and efficient
learning are needed.
Size analysis
• 2x decrease in accuracy with 10x
increase in the number of classes
7
Size analysis
• Techniques that outperforms others on
small datasets may underperform on
large datasets
8
Size analysis
• Semantic hierarchy is correlated to
visual confusion
9
Density Analysis
• Density of a dataset
10
Density Analysis
• Denser dataset predict lower accuracy
11
12
From large scale image categorization
to entry-level categories
Ordonez, Vicente, et al. "From large scale image categorization to entry-level
categories." Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, 2013.
Motivation
• One image has many labels, what
should I actually call it?
13
Entry-Level
category
Definition of entry-level category
• The name that most people tend to call
– 圓仔、熊貓、哺乳類、 Ailuropoda
melanoleuca(學名)
14
To achieve entry-level recognition
• By hypernym?
– Just replace the given output by its
hypernym
15
Bird
sparrow penguin
Problem 1
• You may call a sparrow a bird, but you
may not call a penguin a bird
16
Bird
sparrow penguin
Problem 2
• Encyclopedia knowledge v.s. Common
sense knowledge
17
Tulip is not a
kind of flower.
What a beautiful
flower!
Two methods
• Translate the result to entry-level category
• Directly learn a entry-level classifier
18
Image
Classifier
Tulip Flower
Image
Classifier
Flower
Method 1
• Use a metric for scoring each node
19
Bird
sparrow penguin
Output of linear SVM
0.80.1
0.9
Method 1
• Add the concept of naturalness
• We want v to be natural, but not too
high level to keep specificity
20
In Google 1T corpus,
v appears more
φ(v) gets higher
The max height of the
tree under v
Method 1
• Combine the two scores
• Experiments are passed since there are
too many details...
21
Method 2
• Passed...
22
23
What I have learned from the two
papers above
An interesting perspective...
• Why can we (as a human) recognize tens
of thousands of objects in a really short
time?
• We have simplified the world, or
– We process thing slow (computation cost)
– We receive lots of information(memory cost)
24
我的觀察啦XD
An explanation for the paper
• Different kind of dolphins have similar
properties
– So why bother to know all kind of dolphin?
• Dolphin has similar properties of fish
– So people think it is a kind of fish
25
How do we simplify?
• Hierarchy matters
• But do we follow WordNet?
26
Probably No
• Natural Objects
– We identify them by properties
• Artifacts
– We identify them by functionalities
27
Probably No
• Natural Objects
– We identify them by properties
• Artifacts
– We identify them by functionalities
28
A support from paper
• Even if the result is
incorrect, animals
tend to be miscate-
gorized as other
animals
29
Deng, Jia, et al. "What does classifying more than 10,000 image categories tell
us?." Computer Vision–ECCV 2010. Springer Berlin Heidelberg, 2010. 71-84.
30
Maybe it’s because the logic of
making things are different.
(God v.s. Human)
Artifacts are
made to let
human use.
Natural objects
are made to live
their lives.
How to implement?
• It is still an open question.
31
Yao, Bangpeng, Jiayuan Ma, and Li Fei-Fei. "Discovering object
functionality."Computer Vision (ICCV), 2013 IEEE International Conference on.
IEEE, 2013.
Woods, Kevin, et al. "Learning membership functions in a function-based
object recognition system." J. Artif. Intell. Res.(JAIR) 3 (1995): 187-222.
Weng, Juyang, and Matthew Luciw. "Brain-like emergent spatial
processing."Autonomous Mental Development, IEEE Transactions on 4.2 (2012):
161-185.
32
Improving the Fisher Kernel for Large-
Scale Image Classification
Perronnin, Florent, Jorge Sánchez, and Thomas Mensink. "Improving the fisher kernel
for large-scale image classification." Computer Vision–ECCV 2010. Springer Berlin
Heidelberg, 2010. 143-156.
Fisher vector revisit
• A kind of representation of image
– Input: a set of local descriptors
– Output: a fixed-length fisher vector
33
Fisher vector revisit
• Use GMM to model input images
34
Fisher vector revisit
• Assume: only 2 Gaussians are used
35
Fisher vector revisit
• For each image, N=2
36
Perronnin, Florent, and Christopher Dance. "Fisher kernels on visual vocabularies
for image categorization." Computer Vision and Pattern Recognition, 2007.
CVPR'07. IEEE Conference on.
Fisher vector revisit
• Since we already know the GMM of that
image, we can take derivatives
• Derivatives
– the change of the parameters will change
the fitness of GMM to the image
37
Perronnin, Florent, and Christopher Dance. "Fisher kernels on visual vocabularies
for image categorization." Computer Vision and Pattern Recognition, 2007.
CVPR'07. IEEE Conference on.
Fisher vector revisit
• Concatenate these derivatives, we got
Fisher Vector!
38
Perronnin, Florent, and Christopher Dance. "Fisher kernels on visual vocabularies
for image categorization." Computer Vision and Pattern Recognition, 2007.
CVPR'07. IEEE Conference on.
The number of parameters is
the same for every image
Fisher vector revisit
• The form of Fisher Vector
– Local descriptors
– Fisher Vector (not normalized)
39
Perronnin, Florent, Jorge Sánchez, and Thomas Mensink. "Improving the fisher kernel
for large-scale image classification." Computer Vision–ECCV 2010. Springer Berlin
Heidelberg, 2010. 143-156.
Improvement - L2 Normalization
• Assume: the descriptors of a given
image follow a distribution p
• p has two parts
– background part uλ (image independent)
– Image-specific part q
40
Perronnin, Florent, Jorge Sánchez, and Thomas Mensink. "Improving the fisher kernel
for large-scale image classification." Computer Vision–ECCV 2010. Springer Berlin
Heidelberg, 2010. 143-156.
Improvement - L2 Normalization
• Decompose the vector
41
Perronnin, Florent, Jorge Sánchez, and Thomas Mensink. "Improving the fisher kernel
for large-scale image classification." Computer Vision–ECCV 2010. Springer Berlin
Heidelberg, 2010. 143-156.
Improvement - L2 Normalization
• Learning process minimize the image-
independent part
42
Perronnin, Florent, Jorge Sánchez, and Thomas Mensink. "Improving the fisher kernel
for large-scale image classification." Computer Vision–ECCV 2010. Springer Berlin
Heidelberg, 2010. 143-156.
Improvement - L2 Normalization
• To remove the dependence on ω, we
can L2-normalize the vector
43
Perronnin, Florent, Jorge Sánchez, and Thomas Mensink. "Improving the fisher kernel
for large-scale image classification." Computer Vision–ECCV 2010. Springer Berlin
Heidelberg, 2010. 143-156.
Improvement - Power Normalization
• As the number of Gaussians increases,
Fisher vector becomes sparser
44
16 Gaussians 64 256
Improvement - Power Normalization
• Apply power normalization to each
dimension of Fisher vector
• α=0.5 for 256 Gaussians is reasonable
45
Improvement-Spatial Pyramid
• Original spatial pyramid
46
Improvement-Spatial Pyramid
• Combine spatial pyramid and FK
47
BoW histogram
Improvement-Spatial Pyramid
• Combine spatial pyramid and FK
48
Fisher Vector
Large-Scale Experiments
• Training: ImageNet, Flickr groups, VOC
2007 trainval
• Testing: PASCAL VOC 2007 (20 classes)
49
[29] Harzallah, Hedi, Frédéric Jurie, and Cordelia Schmid. "Combining efficient
object localization and image classification." Computer Vision, 2009 IEEE 12th
International Conference on.
Another thing I want to share
• Deep Learning can be used in robotics!
50
Deep Learning for Detecting Robotic Grasps, Ian Lenz, Honglak Lee, Ashutosh
Saxena. To appear in International Journal of Robotics Research (IJRR), 2014.
51
Thanks for your attention.

Contenu connexe

Tendances

Using parallel programming to improve performance of image processing
Using parallel programming to improve performance of image processingUsing parallel programming to improve performance of image processing
Using parallel programming to improve performance of image processingChan Le
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksMarcinJedyk
 
Sobel Edge Detection Using FPGA
Sobel Edge Detection Using FPGASobel Edge Detection Using FPGA
Sobel Edge Detection Using FPGAghanshyam zambare
 
“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...
“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...
“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...Edge AI and Vision Alliance
 
Introduction to computer vision
Introduction to computer visionIntroduction to computer vision
Introduction to computer visionMarcin Jedyk
 
Deep Learning - A hands-on introduction to image classification
Deep Learning - A hands-on introduction to image classificationDeep Learning - A hands-on introduction to image classification
Deep Learning - A hands-on introduction to image classificationImmanuel Weber
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Image compression and reconstruction using a new approach by artificial neura...
Image compression and reconstruction using a new approach by artificial neura...Image compression and reconstruction using a new approach by artificial neura...
Image compression and reconstruction using a new approach by artificial neura...Hưng Đặng
 
Digital Image Processing: An Introduction
Digital Image Processing: An IntroductionDigital Image Processing: An Introduction
Digital Image Processing: An IntroductionMostafa G. M. Mostafa
 
A review on image processing
A review on image processingA review on image processing
A review on image processingAlexander Decker
 
Image Compression Using Neural Network
 Image Compression Using Neural Network Image Compression Using Neural Network
Image Compression Using Neural NetworkOmkar Lokhande
 
Introduction to deep learning using python
Introduction to deep learning using pythonIntroduction to deep learning using python
Introduction to deep learning using pythonLino Coria
 
Learning spatiotemporal features with 3 d convolutional networks
Learning spatiotemporal features with 3 d convolutional networksLearning spatiotemporal features with 3 d convolutional networks
Learning spatiotemporal features with 3 d convolutional networksSungminYou
 
Neural network based image compression with lifting scheme and rlc
Neural network based image compression with lifting scheme and rlcNeural network based image compression with lifting scheme and rlc
Neural network based image compression with lifting scheme and rlceSAT Publishing House
 
Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...
Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...
Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...Universitat Politècnica de Catalunya
 

Tendances (20)

Using parallel programming to improve performance of image processing
Using parallel programming to improve performance of image processingUsing parallel programming to improve performance of image processing
Using parallel programming to improve performance of image processing
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural Networks
 
Sobel Edge Detection Using FPGA
Sobel Edge Detection Using FPGASobel Edge Detection Using FPGA
Sobel Edge Detection Using FPGA
 
“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...
“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...
“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...
 
Image recognition
Image recognitionImage recognition
Image recognition
 
Doc
DocDoc
Doc
 
Introduction to computer vision
Introduction to computer visionIntroduction to computer vision
Introduction to computer vision
 
Digital image processing
Digital image processingDigital image processing
Digital image processing
 
Deep Learning - A hands-on introduction to image classification
Deep Learning - A hands-on introduction to image classificationDeep Learning - A hands-on introduction to image classification
Deep Learning - A hands-on introduction to image classification
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Image compression and reconstruction using a new approach by artificial neura...
Image compression and reconstruction using a new approach by artificial neura...Image compression and reconstruction using a new approach by artificial neura...
Image compression and reconstruction using a new approach by artificial neura...
 
Digital Image Processing: An Introduction
Digital Image Processing: An IntroductionDigital Image Processing: An Introduction
Digital Image Processing: An Introduction
 
A review on image processing
A review on image processingA review on image processing
A review on image processing
 
Image Compression Using Neural Network
 Image Compression Using Neural Network Image Compression Using Neural Network
Image Compression Using Neural Network
 
Introduction to deep learning using python
Introduction to deep learning using pythonIntroduction to deep learning using python
Introduction to deep learning using python
 
Learning spatiotemporal features with 3 d convolutional networks
Learning spatiotemporal features with 3 d convolutional networksLearning spatiotemporal features with 3 d convolutional networks
Learning spatiotemporal features with 3 d convolutional networks
 
Neural network based image compression with lifting scheme and rlc
Neural network based image compression with lifting scheme and rlcNeural network based image compression with lifting scheme and rlc
Neural network based image compression with lifting scheme and rlc
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Moving object detection
Moving object detectionMoving object detection
Moving object detection
 
Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...
Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...
Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...
 

Similaire à Large scale object recognition (AMMAI presentation)

Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Turi, Inc.
 
Introduction talk to Computer Vision
Introduction talk to Computer Vision Introduction talk to Computer Vision
Introduction talk to Computer Vision Chen Sagiv
 
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - PosterMediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Postermultimediaeval
 
Week06 bme429-cbir
Week06 bme429-cbirWeek06 bme429-cbir
Week06 bme429-cbirIkram Moalla
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning AnalyticsXavier Ochoa
 
Evolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancementsEvolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancementsChitta Ranjan
 
Chap_1_Digital_Image_Fundamentals_DD (2).pdf
Chap_1_Digital_Image_Fundamentals_DD (2).pdfChap_1_Digital_Image_Fundamentals_DD (2).pdf
Chap_1_Digital_Image_Fundamentals_DD (2).pdfMrNeon5
 
2017 07 03_meetup_d
2017 07 03_meetup_d2017 07 03_meetup_d
2017 07 03_meetup_dDana Brophy
 
2017 07 03_meetup_d
2017 07 03_meetup_d2017 07 03_meetup_d
2017 07 03_meetup_dDana Brophy
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkFerdous ahmed
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Universitat Politècnica de Catalunya
 
SFScon21 - Roberto Confalonieri - Boyuan Sun - Hyper-spectral image classific...
SFScon21 - Roberto Confalonieri - Boyuan Sun - Hyper-spectral image classific...SFScon21 - Roberto Confalonieri - Boyuan Sun - Hyper-spectral image classific...
SFScon21 - Roberto Confalonieri - Boyuan Sun - Hyper-spectral image classific...South Tyrol Free Software Conference
 
Deep learning trends
Deep learning trendsDeep learning trends
Deep learning trends준호 김
 
Optimized Feedforward Network of CNN with Xnor Final Presentation
Optimized Feedforward Network of CNN with Xnor Final PresentationOptimized Feedforward Network of CNN with Xnor Final Presentation
Optimized Feedforward Network of CNN with Xnor Final PresentationIndiana University Bloomington
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsBenjamin Le
 
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikDeep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikThe Hive
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learningleopauly
 

Similaire à Large scale object recognition (AMMAI presentation) (20)

Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
 
Introduction talk to Computer Vision
Introduction talk to Computer Vision Introduction talk to Computer Vision
Introduction talk to Computer Vision
 
GAN Evaluation
GAN EvaluationGAN Evaluation
GAN Evaluation
 
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - PosterMediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
 
Week06 bme429-cbir
Week06 bme429-cbirWeek06 bme429-cbir
Week06 bme429-cbir
 
Object Recognition
Object RecognitionObject Recognition
Object Recognition
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
Evolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancementsEvolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancements
 
Chap_1_Digital_Image_Fundamentals_DD (2).pdf
Chap_1_Digital_Image_Fundamentals_DD (2).pdfChap_1_Digital_Image_Fundamentals_DD (2).pdf
Chap_1_Digital_Image_Fundamentals_DD (2).pdf
 
2017 07 03_meetup_d
2017 07 03_meetup_d2017 07 03_meetup_d
2017 07 03_meetup_d
 
2017 07 03_meetup_d
2017 07 03_meetup_d2017 07 03_meetup_d
2017 07 03_meetup_d
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
 
SFScon21 - Roberto Confalonieri - Boyuan Sun - Hyper-spectral image classific...
SFScon21 - Roberto Confalonieri - Boyuan Sun - Hyper-spectral image classific...SFScon21 - Roberto Confalonieri - Boyuan Sun - Hyper-spectral image classific...
SFScon21 - Roberto Confalonieri - Boyuan Sun - Hyper-spectral image classific...
 
Deep learning trends
Deep learning trendsDeep learning trends
Deep learning trends
 
Optimized Feedforward Network of CNN with Xnor Final Presentation
Optimized Feedforward Network of CNN with Xnor Final PresentationOptimized Feedforward Network of CNN with Xnor Final Presentation
Optimized Feedforward Network of CNN with Xnor Final Presentation
 
Eren_Golge_MS_Thesis_2014
Eren_Golge_MS_Thesis_2014Eren_Golge_MS_Thesis_2014
Eren_Golge_MS_Thesis_2014
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikDeep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 

Plus de Po-Jen Lai

COSCUP 2016 - ROS + Gazebo機器人模擬器工作坊
COSCUP 2016 - ROS + Gazebo機器人模擬器工作坊COSCUP 2016 - ROS + Gazebo機器人模擬器工作坊
COSCUP 2016 - ROS + Gazebo機器人模擬器工作坊Po-Jen Lai
 
漫談台灣機器人 產業&發展
漫談台灣機器人 產業&發展漫談台灣機器人 產業&發展
漫談台灣機器人 產業&發展Po-Jen Lai
 
iCeiRA碩班研究指導
iCeiRA碩班研究指導iCeiRA碩班研究指導
iCeiRA碩班研究指導Po-Jen Lai
 
Seminar報告_20150520
Seminar報告_20150520Seminar報告_20150520
Seminar報告_20150520Po-Jen Lai
 
淺談台灣機器人 產業&發展
淺談台灣機器人 產業&發展淺談台灣機器人 產業&發展
淺談台灣機器人 產業&發展Po-Jen Lai
 
Drove v.english
Drove v.englishDrove v.english
Drove v.englishPo-Jen Lai
 

Plus de Po-Jen Lai (6)

COSCUP 2016 - ROS + Gazebo機器人模擬器工作坊
COSCUP 2016 - ROS + Gazebo機器人模擬器工作坊COSCUP 2016 - ROS + Gazebo機器人模擬器工作坊
COSCUP 2016 - ROS + Gazebo機器人模擬器工作坊
 
漫談台灣機器人 產業&發展
漫談台灣機器人 產業&發展漫談台灣機器人 產業&發展
漫談台灣機器人 產業&發展
 
iCeiRA碩班研究指導
iCeiRA碩班研究指導iCeiRA碩班研究指導
iCeiRA碩班研究指導
 
Seminar報告_20150520
Seminar報告_20150520Seminar報告_20150520
Seminar報告_20150520
 
淺談台灣機器人 產業&發展
淺談台灣機器人 產業&發展淺談台灣機器人 產業&發展
淺談台灣機器人 產業&發展
 
Drove v.english
Drove v.englishDrove v.english
Drove v.english
 

Dernier

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 

Dernier (20)

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 

Large scale object recognition (AMMAI presentation)

  • 2. Motivation • People can recognize tens of thousands of objects... • How about computers? 2
  • 3. 3 "What does classifying more than 10,000 image categories tell us?” tries to discuss this question Deng, Jia, et al. "What does classifying more than 10,000 image categories tell us?." Computer Vision–ECCV 2010. Springer Berlin Heidelberg, 2010. 71-84.
  • 4. Datasets • ImageNet10K – 10184 categories, 9 million images • ImageNet7K (7404 categories) • ImageNet1K (1000 categories) • Rand200{a,b,c} (200 categories) • CalNet200 (200 categories) • Ungulate183, Fungus134, Vehicle262 4
  • 5. Algorithms • GIST+NN – kNN on L2 distance • BOW + NN – SIFT for BOW, kNN on L1 distance • BOW + SVM – # of SVM == # of categories (1-vs-all) • SPM + SVM – SIFT for SPM, 1-vs-all SVM 5
  • 6. Computation time analysis • BOW+SVM (ImageNet 10K) – A 1-vs-all SVM classifier needs 1 hr (2.66 GHz Intel Xeon) – 16 hrs for testing • 66 multi-core machine needs several weeks 6 Distributed computing and efficient learning are needed.
  • 7. Size analysis • 2x decrease in accuracy with 10x increase in the number of classes 7
  • 8. Size analysis • Techniques that outperforms others on small datasets may underperform on large datasets 8
  • 9. Size analysis • Semantic hierarchy is correlated to visual confusion 9
  • 10. Density Analysis • Density of a dataset 10
  • 11. Density Analysis • Denser dataset predict lower accuracy 11
  • 12. 12 From large scale image categorization to entry-level categories Ordonez, Vicente, et al. "From large scale image categorization to entry-level categories." Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, 2013.
  • 13. Motivation • One image has many labels, what should I actually call it? 13 Entry-Level category
  • 14. Definition of entry-level category • The name that most people tend to call – 圓仔、熊貓、哺乳類、 Ailuropoda melanoleuca(學名) 14
  • 15. To achieve entry-level recognition • By hypernym? – Just replace the given output by its hypernym 15 Bird sparrow penguin
  • 16. Problem 1 • You may call a sparrow a bird, but you may not call a penguin a bird 16 Bird sparrow penguin
  • 17. Problem 2 • Encyclopedia knowledge v.s. Common sense knowledge 17 Tulip is not a kind of flower. What a beautiful flower!
  • 18. Two methods • Translate the result to entry-level category • Directly learn a entry-level classifier 18 Image Classifier Tulip Flower Image Classifier Flower
  • 19. Method 1 • Use a metric for scoring each node 19 Bird sparrow penguin Output of linear SVM 0.80.1 0.9
  • 20. Method 1 • Add the concept of naturalness • We want v to be natural, but not too high level to keep specificity 20 In Google 1T corpus, v appears more φ(v) gets higher The max height of the tree under v
  • 21. Method 1 • Combine the two scores • Experiments are passed since there are too many details... 21
  • 23. 23 What I have learned from the two papers above
  • 24. An interesting perspective... • Why can we (as a human) recognize tens of thousands of objects in a really short time? • We have simplified the world, or – We process thing slow (computation cost) – We receive lots of information(memory cost) 24 我的觀察啦XD
  • 25. An explanation for the paper • Different kind of dolphins have similar properties – So why bother to know all kind of dolphin? • Dolphin has similar properties of fish – So people think it is a kind of fish 25
  • 26. How do we simplify? • Hierarchy matters • But do we follow WordNet? 26
  • 27. Probably No • Natural Objects – We identify them by properties • Artifacts – We identify them by functionalities 27
  • 28. Probably No • Natural Objects – We identify them by properties • Artifacts – We identify them by functionalities 28
  • 29. A support from paper • Even if the result is incorrect, animals tend to be miscate- gorized as other animals 29 Deng, Jia, et al. "What does classifying more than 10,000 image categories tell us?." Computer Vision–ECCV 2010. Springer Berlin Heidelberg, 2010. 71-84.
  • 30. 30 Maybe it’s because the logic of making things are different. (God v.s. Human) Artifacts are made to let human use. Natural objects are made to live their lives.
  • 31. How to implement? • It is still an open question. 31 Yao, Bangpeng, Jiayuan Ma, and Li Fei-Fei. "Discovering object functionality."Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, 2013. Woods, Kevin, et al. "Learning membership functions in a function-based object recognition system." J. Artif. Intell. Res.(JAIR) 3 (1995): 187-222. Weng, Juyang, and Matthew Luciw. "Brain-like emergent spatial processing."Autonomous Mental Development, IEEE Transactions on 4.2 (2012): 161-185.
  • 32. 32 Improving the Fisher Kernel for Large- Scale Image Classification Perronnin, Florent, Jorge Sánchez, and Thomas Mensink. "Improving the fisher kernel for large-scale image classification." Computer Vision–ECCV 2010. Springer Berlin Heidelberg, 2010. 143-156.
  • 33. Fisher vector revisit • A kind of representation of image – Input: a set of local descriptors – Output: a fixed-length fisher vector 33
  • 34. Fisher vector revisit • Use GMM to model input images 34
  • 35. Fisher vector revisit • Assume: only 2 Gaussians are used 35
  • 36. Fisher vector revisit • For each image, N=2 36 Perronnin, Florent, and Christopher Dance. "Fisher kernels on visual vocabularies for image categorization." Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on.
  • 37. Fisher vector revisit • Since we already know the GMM of that image, we can take derivatives • Derivatives – the change of the parameters will change the fitness of GMM to the image 37 Perronnin, Florent, and Christopher Dance. "Fisher kernels on visual vocabularies for image categorization." Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on.
  • 38. Fisher vector revisit • Concatenate these derivatives, we got Fisher Vector! 38 Perronnin, Florent, and Christopher Dance. "Fisher kernels on visual vocabularies for image categorization." Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. The number of parameters is the same for every image
  • 39. Fisher vector revisit • The form of Fisher Vector – Local descriptors – Fisher Vector (not normalized) 39 Perronnin, Florent, Jorge Sánchez, and Thomas Mensink. "Improving the fisher kernel for large-scale image classification." Computer Vision–ECCV 2010. Springer Berlin Heidelberg, 2010. 143-156.
  • 40. Improvement - L2 Normalization • Assume: the descriptors of a given image follow a distribution p • p has two parts – background part uλ (image independent) – Image-specific part q 40 Perronnin, Florent, Jorge Sánchez, and Thomas Mensink. "Improving the fisher kernel for large-scale image classification." Computer Vision–ECCV 2010. Springer Berlin Heidelberg, 2010. 143-156.
  • 41. Improvement - L2 Normalization • Decompose the vector 41 Perronnin, Florent, Jorge Sánchez, and Thomas Mensink. "Improving the fisher kernel for large-scale image classification." Computer Vision–ECCV 2010. Springer Berlin Heidelberg, 2010. 143-156.
  • 42. Improvement - L2 Normalization • Learning process minimize the image- independent part 42 Perronnin, Florent, Jorge Sánchez, and Thomas Mensink. "Improving the fisher kernel for large-scale image classification." Computer Vision–ECCV 2010. Springer Berlin Heidelberg, 2010. 143-156.
  • 43. Improvement - L2 Normalization • To remove the dependence on ω, we can L2-normalize the vector 43 Perronnin, Florent, Jorge Sánchez, and Thomas Mensink. "Improving the fisher kernel for large-scale image classification." Computer Vision–ECCV 2010. Springer Berlin Heidelberg, 2010. 143-156.
  • 44. Improvement - Power Normalization • As the number of Gaussians increases, Fisher vector becomes sparser 44 16 Gaussians 64 256
  • 45. Improvement - Power Normalization • Apply power normalization to each dimension of Fisher vector • α=0.5 for 256 Gaussians is reasonable 45
  • 47. Improvement-Spatial Pyramid • Combine spatial pyramid and FK 47 BoW histogram
  • 48. Improvement-Spatial Pyramid • Combine spatial pyramid and FK 48 Fisher Vector
  • 49. Large-Scale Experiments • Training: ImageNet, Flickr groups, VOC 2007 trainval • Testing: PASCAL VOC 2007 (20 classes) 49 [29] Harzallah, Hedi, Frédéric Jurie, and Cordelia Schmid. "Combining efficient object localization and image classification." Computer Vision, 2009 IEEE 12th International Conference on.
  • 50. Another thing I want to share • Deep Learning can be used in robotics! 50 Deep Learning for Detecting Robotic Grasps, Ian Lenz, Honglak Lee, Ashutosh Saxena. To appear in International Journal of Robotics Research (IJRR), 2014.
  • 51. 51 Thanks for your attention.