SlideShare une entreprise Scribd logo
1  sur  54
Télécharger pour lire hors ligne
Mining Web Images for Concept Learning
A work by Eren Golge
Advisor: Asst. Prof. Dr. Pinar Duygulu
– hard to have large annotated images.
– Use weakly-labeled images from Internet
– polysemy and irrelevancy in Internet images
– visual variations of targeted concepts (sub-modularity)
– Use our methods CMAP and AME :)
Data size
Model learning
Meta Pipeline
Refine DATA
Learn Classifiers
Polysemy, Irrelevancy, Sub-Grouping
Weakly Labelled Images
High quality Concept Models
Short Retrospective
Use annotated control set as a start point.
– Fergus et. al. [1], OPTIMOL, Li and Fei-Fei [2]
– We use fully autonomous framework.
Use Textual Captions
– Berg and Forsyth [3]
– We use only visual content
Discriminative image cues
– Efros et al. [4] “Discriminative Patches”, Q. Li et al.[5]
– We use single computer with faster and better results.
CMAP and AME have broader possible applications
[1] Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: Computer Vision, 2005. ICCV 2005
[2] Berg, T.L., Berg, A.C., Edwards, J., Maire, M., White, R., Teh, Y.W., Learned-Miller, E.G., Forsyth, D.A.: Names and faces in the news. In: IEEE Conference on
Computer Vision
Pattern Recognition (CVPR). Volume 2. (2004) 848–854
[3] Li, L.J., Fei-Fei, L.: Optimol: automatic online picture collection via incremental model learning. International journal of computer vision 88(2) (2010) 147–168
[4] Li, Q., Wu, J., & Tu, Z. (n.d.). Harvesting Mid-level Visual Concepts from Large-scale Internet Images.
Method #1
ConceptMap - CMAP
Method #1 : CMAP
Outlier detection+
Concept Map - CMAP
Accepted for
Draft version :
Polysemy and Sub-Grouping
CMAP's motivation
Very Generic method for other domains as well (textual, biological etc.)
Extension of SOM (a.k.a. Kohonen's Map) *
Inspired by biological phenomenas **
Able to cluster data and detect outliers
Irrelevancy and Sub-Grouping SOLVED!!
*Kohonen, T.: Self-organizing maps. Springer (1997)
**Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal
of physiology 160(1) (1962) 106
Outlier clusters
Outlier instances in salient clusters
CMAP cont'
finding outlier units
Look activation statistics of each SOM unit in
learning phase
Latter learning iterations are more reliable
IF a unit is activated
Winner activations Neighbor activations
CMAP cont'
finding sole outliers
CMAP mapping
CMAP in action
Learning Models
Learn L1 linear SVM models
– Easier to train
– Better for high dimensional data
(wide data matrix)
– Implicit feature selection by L1
Learn one linear model from each
salient cluster
Each concept has multiple models
– Polysemy SOLVED!!
CMAP Overview
via CMAP
Learn Attributes
Learn Faces
Learn Scenes
Learn Objects
Improve Detection Performance
Only use images for learning
Attack to problems:
– Attribute Learning : [1] , Images, [2],
Learn Texture and Color attributes
– Scene Learning : MIT-indoor [4], Scene-15 [5]
Use learned Attributes as mid-level features
– Face Recognition : FAN-Large [6]
– Object Recognition : Google data-set [3]
– Faster Object Detection: Enhancement over Selective Search [7]
[1] Russakovsky, O., Fei-Fei, L.: Attribute learning in large-scale datasets. In: Trends and Topics in Computer Vision. Springer (2012)
[2] Van De Weijer, J., Schmid, C., Verbeek, J., Larlus, D.: Learning color names for real-world applications. Image Processing, IEEE (2009)
[3] Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: Computer Vision, 2005. ICCV 2005
[4] Quattoni, A., Torralba, A.: Recognizing indoor scenes. CVPR (2009)
[5] Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. CVPR 2006
[6] Ozcan, M., Luo, J., Ferrari, V., Caputo, B.: A large-scale database of images and captions for automatic face naming. In: BMVC. (2011)
[7] Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer vision 104.2 (2013): 154-171.
Visual Examples
Visual Examples # Faces
Salient Clusters Outlier Clusters Outlier Instances
20Salient Clusters Outlier Clusters Outlier Instances
Implementation Details
Visual Features :
– BoW SIFT with 4000 words (for texture attribute, object and face)
– Use 3D 10x20x20 Lab Histograms (for attribute)
– 256 dimensional LBP [1] (for object and face)
– Attribute: Extract random 100x100 non-overlapping image patches from each image.
– Scene: Represent each image with the confidence scores of attribute classifiers in a Spatial Pyramid sense
– Face: Apply face detection[2] to each image and get one highest score patch.
– Object: Apply unsupervised saliency detection [3] to images and get a single highest activation region.
Model Learning
– Use outliers and some sample of other concept instances as Negative set
– Apply Hard Mining[4]
– Tune all hyper parameters via X-validation on the (classifiers and RSOM parameters)
– We use Google images to train concept models and deal with DOMAIN ADAPTATION
[1] Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Analysis and
Machine Intelligence, IEEE Transactions on 24(7) (2002) 971–987
[2] Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: Computer Vision and Pattern Recognition (CVPR), 2012
IEEE Conference on, IEEE (2012) 2879–2886
[3] Erdem, E., Erdem, A.: Visual saliency estimation by nonlinearly integrating features using region covariances. Journal of Vision 13(4) (2013) 1–20
[4] Felzenszwalb, Pedro F., et al. "Object detection with discriminatively trained part-based models." Pattern Analysis and Machine Intelligence, IEEE
Transactions on 32.9 (2010): 1627-1645.
Attribute Learning
Ours State of art
Attribute Image-Net 0.37 0.36 [4]
Attribute ebay 0.81 0.79 [3]
Attribute bing 0.82
[3] Van De Weijer, J., Schmid, C., Verbeek, J., Larlus, D.: Learning color names for real-world applications. Image Processing, IEEE (2009)
[4] Russakovsky, O., Fei-Fei, L.: Attribute learning in large-scale datasets. In: Trends and Topics in Computer Vision. Springer (2012)
[5]O. Russakovsky and L. Fei-Fei, “Attribute learning in large-scale datasets,” in Trends and Topics in Computer Vision, pp. 1–14, Springer, 2012.
Scene Learning
MIT-indoor Scene-15
CMAP-A 46.2% 82.7%
CMAP-S 40.8% 80.7%
CMAP-S+HM 41.7% 81.3%
Li et al. [1] 47.6% 82.1%
Pandey et al. [82] 43.1% -
Kwitt et al. [3] 44% 82.3%
Lazebnik et al. [4] - 81%
Singh et al. [5] 38% 77%
CMAP-A : Attribute based Scene Learning.
CMAP-S : Scenes Learning from directly CMAP.
CMAP-S+HM : Scene Learning from CMAP with hard mining.
[1] Q. Li, J. Wu, and Z. Tu, “Harvesting mid-level visual concepts from large-scale internet images,” CVPR, 2013.
[2] M. Pandey and S. Lazebnik, “Scene recognition and weakly supervised object localization with deformable part-based models,” ICCV, 2011.
[3] R. Kwitt, N. Vasconcelos, and N. Rasiwasia, “Scene recognition on the semantic manifold,” ECCV, 2012.
[4] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” in Computer Vision
and Pattern Recognition, 2006”
[5] S. Singh, A. Gupta, and A. A. Efros, “Unsupervised discovery of mid-level discriminative patches,” in European Conference Computer Vision (ECCV), 2012.
Object Learning
CMAP [1] [2] CMAP [1] [2]
airplane 0.63 0.51 0.76 car 0.97 0.98 0.94
face 0.67 0.52 0.82 guitar 0.89 0.81 0.60
leopard 0.76 0.74 0.89 motorbike 0.98 0.98 0.67
watch 0.55 0.48 0.53 overall 0.78 0.72 0.75
[1] R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman, “Learning object categories from google’s image search,” in Computer
Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, vol. 2, pp. 1816–1823, IEEE, 2005.
[2] L.-J. Li and L. Fei-Fei, “Optimol: automatic online picture collection via incremental model learning,” International journal of
computer vision, vol. 88, no. 2, pp. 147–168, 2010.
→ Dataset provided by [1]
Face Learning
GBC+CF(half)[1] CMAP-1 CMAP-2 Baseline
EASY 0.58 0.63 0.66 0.31
HARD 0.32 0.34 0.38 0.18
→ Face learning results with detecting faces using OpenCV detector
[1] M. Ozcan, J. Luo, V. Ferrari, and B. Caputo, “A large-scale database of images and captions for automatic face
naming.,” in BMVC, pp. 1–11, 2011.
Selective Search
From “Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer vision 104.2 (2013): 154-171.”
Selective Search with CMAP
Remove outlier candidate regions from the detection tree
of Selective Search[1]
~ 3.500 lower candidate region per image with better
Recall and MABO*.
[1] Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer vision 104.2 (2013): 154-171.
[2] B. Alexe, T. Deselaers, and V. Ferrari, “Measuring the objectness of image windows,” Pattern Analysis and Machine Intelligence, IEEE
Transactions on, vol. 34, no. 11, pp. 2189–2202, 2012.
MABO Recall No. of Windows
Objectness [2] 0.69 0.94 1.853
Selective Search [1] 0.87 0.99 10.097
Selective Search +
0.89 0.99 6.753
* MABO : Mean Average Best Overlap.
Results Summary
Ours State of art
Face 0.66 0.58 [1]
Scene 0.47 0.48 [5]
Object 0.78 0.75 [2]
Attribute Image-Net 0.37 0.36 [3]
Attribute ebay 0.81 0.79 [4]
Attribute bing 0.82
[1] Ozcan, M., Luo, J., Ferrari, V., Caputo, B.: A large-scale database of images and captions for automatic face naming. BMVC. (2011)
[2] Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: Computer Vision, 2005. ICCV 2005
[3] Russakovsky, O., Fei-Fei, L.: Attribute learning in large-scale datasets. In: Trends and Topics in Computer Vision. Springer (2012)
[4] Van De Weijer, J., Schmid, C., Verbeek, J., Larlus, D.: Learning color names for real-world applications. Image Processing, IEEE (2009)
[5] Li, Q., Wu, J., Tu, Z.: Harvesting mid-level visual concepts from large-scale internet images. CVPR (2013)
AME – Association through Model
(Association through Model Evolution)
Iterative data cleansing
Measure discriminativeness and representativeness.
Define category versus random instances.
AME's motivation
One another agnostic data refining method
against Irrelevancy.
Make use of far too much random instances
as oppose to limited annotated instances.
Evade Sub-Grouping using very high
dimensional representations.
AME's method overview
First discern category candidates (CC) from
random set (RS).
Define category references(CR).
Second discern CR from CC.
Define spurious instances (SI) against CR and
Irrelevancy Solved!!
Discerning category from random set
– Learn a linear model M1 between CC and RS.
– Take the most confidently classified instances as the
Discerning category references from others
– Learn linear model M2 between CR and others.
Define SI against CR.
Eliminate SI.
Visual Examples
High Dimensional Representation
Our problem is discrimination.
Therefore ;
– High dimensions makes any category linearly
separable from others despite of category
sub-grouping solved !!
Feature Learning
Learn frequent pattern on the data
Learning Pipeline (similar to [1]);
1. Scrap random nxn patches from the images.
Over Collected Patches;
2. Contrast normalization
3. ZCA Whitening
4. K-means for C words
Over Whole Image;
5. Spatial (Max or Avg) Pooling by C words
Learned Visual Words
[1] Coates, Adam, Andrew Y. Ng, and Honglak Lee. "An analysis of single-layer networks in unsupervised feature learning." International Conference on
Artificial Intelligence and Statistics. 2011.
={ 5 x C words }
dimension for each img
AME overview
Implementation Details
– L1 Logistic Regression with Gauss-Seidel algorithm [1]
– Final model L1 Linear SVM with Grafting[2].
– At each iteration 5 images are eliminated.
Feature Learning
– Use horizontally flipped images.
– Re-size each gray-level image 60px height.
– Contrast Normalization to random patches.
– ZCA whitening with Ɛ=0.5.
– Receptive field size 6x6 pixels
– 1 px stride with 2400 words
[1] Shirish Krishnaj Shevade and S Sathiya Keerthi. A simple and efficient algorithm for gene selection using sparse logistic regression.
Bioinformatics,19(17):2246–2253, 2003.
[2] Simon Perkins, Kevin Lacker, and James Theiler. Grafting: Fast, incremental feature selection by gradient descent in function space. The Journal of
Machine Learning Research, 3:1333–1356, 2003.
FAN-Large [2]
– EASY subset: faces larger than 60x70 px, 138
– ALL: no constraint, 365 categories.
– Subset of PubFig with 83 celebrities at least 100
images for each.
[1]N. Pinto, Z. Stone, T. Zickler, and D. Cox, “Scaling up biologically-inspired computer vision: A case study in unconstrained face
recognition on facebook,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE
Computer Society Conference on, pp. 35–42, IEEE, 2011.
[2] M. Ozcan, J. Luo, V. Ferrari, and B. Caputo, “A large-scale database of images and captions for automatic face naming.,” in
BMVC, pp. 1–11, 2011.
Classification Pipeline
● No data refining
● Models are trained on the training set of the given dataset
● Results on PubFig83
[1]N. Pinto, Z. Stone, T. Zickler, and D. Cox, “Scaling up biologically-inspired computer vision: A case
study in unconstrained face recognition on facebook,” in Computer Vision and Pattern Recognition
Workshops (CVPRW), 2011 IEEE Computer Society Conference on, pp. 35–42, IEEE, 2011
[2]B. C. Becker and E. G. Ortiz, “Evaluating open-universe face identification on the web,” in Computer
Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on, pp. 904–911, IEEE,
● ~5% improvement on State of Art
● Better results with more words
AME results
● Baseline is the same classification pipeline without any data refining.
● All models are learned from web images.
[18] S. Singh, A. Gupta, and A. A. Efros, “Unsupervised discovery of mid-level discriminative patches,” in
European Conference Computer Vision (ECCV), 2012.
AME-- False vs True Elimination
→ Incremental plot of correct versus false outlier detections until AME finds all the outliers
for all classes. Each iteration values are aggregated by the previous iteration.
AME-- X-val Accuracies
→ Cross-validation (final-model) and M1 accuracies as the algorithm proceeds.
This shows a salient correlation between cross-validation classifier and M1 models,
without M1 models incurring over-fitting.
AME-- # Elimination vs Accuracy
→ Effect of number of outliers removed at each iteration versus final test accuracy.
It is observed that elimination after some limit imposes degradation of final
performance and eliminating 1 instance per iteration is the salient selection without
any sanity check.
Final Words
Which one to Choose?
Polysemy + Irrelevancy in the DATA CMAP→
Only Irrelevancy in the DATA AME→
Another choice:
– Use AME first then CMAP
– Not testified !
At the End
We propose two novel algorithms CMAP and
Compelling results against state-of-art
methods for variety of Vision Tasks
Learn complex visual concepts with a simple
That's the end ... Thanks for
your valuable time :)

Contenu connexe


Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General AudiencesSangwoo Mo
Visual Object Tracking: review
Visual Object Tracking: reviewVisual Object Tracking: review
Visual Object Tracking: reviewDmytro Mishkin
Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1Suvadip Shome
3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection
3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection
3D Dynamic Facial Sequences Analsysis for face recognition and emotion detectionTaleb ALASHKAR
Face Detection techniques
Face Detection techniquesFace Detection techniques
Face Detection techniquesAbhineet Bhamra
Face and Eye Detection Varying Scenarios With Haar Classifier_2015
Face and Eye Detection Varying Scenarios With Haar Classifier_2015Face and Eye Detection Varying Scenarios With Haar Classifier_2015
Face and Eye Detection Varying Scenarios With Haar Classifier_2015Showrav Mazumder
Face Recognition Methods based on Convolutional Neural Networks
Face Recognition Methods based on Convolutional Neural NetworksFace Recognition Methods based on Convolutional Neural Networks
Face Recognition Methods based on Convolutional Neural NetworksElaheh Rashedi
Object tracking a survey
Object tracking a surveyObject tracking a survey
Object tracking a surveyHaseeb Hassan
Talk 8-Kevin-Imagej2
Talk 8-Kevin-Imagej2 Talk 8-Kevin-Imagej2
Talk 8-Kevin-Imagej2 Jay Thakkar
Face Recognition Based on Deep Learning (Yurii Pashchenko Technology Stream)
Face Recognition Based on Deep Learning (Yurii Pashchenko Technology Stream) Face Recognition Based on Deep Learning (Yurii Pashchenko Technology Stream)
Face Recognition Based on Deep Learning (Yurii Pashchenko Technology Stream) IT Arena
Moving object detection
Moving object detectionMoving object detection
Moving object detectionManav Mittal
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamScene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamWithTheBest
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)Universitat Politècnica de Catalunya
IRJET- Face Counter using Matlab
IRJET-  	  Face Counter using MatlabIRJET-  	  Face Counter using Matlab
IRJET- Face Counter using MatlabIRJET Journal
Video surveillance Moving object detection& tracking Chapter 1
Video surveillance Moving object detection& tracking Chapter 1 Video surveillance Moving object detection& tracking Chapter 1
Video surveillance Moving object detection& tracking Chapter 1 ahmed mokhtar
Image Processing Introduction
Image Processing IntroductionImage Processing Introduction
Image Processing IntroductionAhmed Gad
Project Face Detection
Project Face Detection Project Face Detection
Project Face Detection Abu Saleh Musa

Tendances (20)

Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
Visual Object Tracking: review
Visual Object Tracking: reviewVisual Object Tracking: review
Visual Object Tracking: review
Image recognition
Image recognitionImage recognition
Image recognition
Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1
3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection
3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection
3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection
Object detection
Object detectionObject detection
Object detection
Face Detection techniques
Face Detection techniquesFace Detection techniques
Face Detection techniques
Face and Eye Detection Varying Scenarios With Haar Classifier_2015
Face and Eye Detection Varying Scenarios With Haar Classifier_2015Face and Eye Detection Varying Scenarios With Haar Classifier_2015
Face and Eye Detection Varying Scenarios With Haar Classifier_2015
Face Recognition Methods based on Convolutional Neural Networks
Face Recognition Methods based on Convolutional Neural NetworksFace Recognition Methods based on Convolutional Neural Networks
Face Recognition Methods based on Convolutional Neural Networks
Object tracking a survey
Object tracking a surveyObject tracking a survey
Object tracking a survey
Talk 8-Kevin-Imagej2
Talk 8-Kevin-Imagej2 Talk 8-Kevin-Imagej2
Talk 8-Kevin-Imagej2
Face Recognition Based on Deep Learning (Yurii Pashchenko Technology Stream)
Face Recognition Based on Deep Learning (Yurii Pashchenko Technology Stream) Face Recognition Based on Deep Learning (Yurii Pashchenko Technology Stream)
Face Recognition Based on Deep Learning (Yurii Pashchenko Technology Stream)
Moving object detection
Moving object detectionMoving object detection
Moving object detection
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamScene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)
IRJET- Face Counter using Matlab
IRJET-  	  Face Counter using MatlabIRJET-  	  Face Counter using Matlab
IRJET- Face Counter using Matlab
Video surveillance Moving object detection& tracking Chapter 1
Video surveillance Moving object detection& tracking Chapter 1 Video surveillance Moving object detection& tracking Chapter 1
Video surveillance Moving object detection& tracking Chapter 1
Image Processing Introduction
Image Processing IntroductionImage Processing Introduction
Image Processing Introduction
Project Face Detection
Project Face Detection Project Face Detection
Project Face Detection

Similaire à Eren_Golge_MS_Thesis_2014

Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple featuresHirantha Pradeep
Long-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep LearningLong-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep LearningElaheh Rashedi
Introduction talk to Computer Vision
Introduction talk to Computer Vision Introduction talk to Computer Vision
Introduction talk to Computer Vision Chen Sagiv
Introduction to Face Processing with Computer Vision
Introduction to Face Processing with Computer VisionIntroduction to Face Processing with Computer Vision
Introduction to Face Processing with Computer VisionAll Things Open
Real-time Face Detection and Recognition
Real-time Face Detection and RecognitionReal-time Face Detection and Recognition
Real-time Face Detection and RecognitionJia-Bin Huang
Visual Saliency Model Using Sift and Comparison of Learning Approaches
Visual Saliency Model Using Sift and Comparison of Learning ApproachesVisual Saliency Model Using Sift and Comparison of Learning Approaches
Visual Saliency Model Using Sift and Comparison of Learning Approachescsandit
Computer vision introduction
Computer vision  introduction Computer vision  introduction
Computer vision introduction Wael Badawy
Facial emotion detection on babies' emotional face using Deep Learning.
Facial emotion detection on babies' emotional face using Deep Learning.Facial emotion detection on babies' emotional face using Deep Learning.
Facial emotion detection on babies' emotional face using Deep Learning.Takrim Ul Islam Laskar
Learning with Relative Attributes
Learning with Relative AttributesLearning with Relative Attributes
Learning with Relative AttributesVikas Jain
Multilabel Image Retreval Using Hashing
Multilabel Image Retreval Using HashingMultilabel Image Retreval Using Hashing
Multilabel Image Retreval Using HashingSurbhi Bhosale
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural NetworkTargeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Networkijceronline
Deep learning in Computer Vision
Deep learning in Computer VisionDeep learning in Computer Vision
Deep learning in Computer VisionDavid Dao
Survey of The Problem of Object Detection In Real Images
Survey of The Problem of Object Detection In Real ImagesSurvey of The Problem of Object Detection In Real Images
Survey of The Problem of Object Detection In Real ImagesCSCJournals
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...Tulipp. Eu
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - Hiroshi Fukui
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)Fellowship at Vodafone FutureLab
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)Fellowship at Vodafone FutureLab
Obscenity Detection in Images
Obscenity Detection in ImagesObscenity Detection in Images
Obscenity Detection in ImagesAnil Kumar Gupta

Similaire à Eren_Golge_MS_Thesis_2014 (20)

Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple features
Long-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep LearningLong-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep Learning
Introduction talk to Computer Vision
Introduction talk to Computer Vision Introduction talk to Computer Vision
Introduction talk to Computer Vision
Introduction to Face Processing with Computer Vision
Introduction to Face Processing with Computer VisionIntroduction to Face Processing with Computer Vision
Introduction to Face Processing with Computer Vision
Real-time Face Detection and Recognition
Real-time Face Detection and RecognitionReal-time Face Detection and Recognition
Real-time Face Detection and Recognition
Visual Saliency Model Using Sift and Comparison of Learning Approaches
Visual Saliency Model Using Sift and Comparison of Learning ApproachesVisual Saliency Model Using Sift and Comparison of Learning Approaches
Visual Saliency Model Using Sift and Comparison of Learning Approaches
Computer vision introduction
Computer vision  introduction Computer vision  introduction
Computer vision introduction
Real time facial expression analysis using pca
Real time facial expression analysis using pcaReal time facial expression analysis using pca
Real time facial expression analysis using pca
Facial emotion detection on babies' emotional face using Deep Learning.
Facial emotion detection on babies' emotional face using Deep Learning.Facial emotion detection on babies' emotional face using Deep Learning.
Facial emotion detection on babies' emotional face using Deep Learning.
Learning with Relative Attributes
Learning with Relative AttributesLearning with Relative Attributes
Learning with Relative Attributes
Multilabel Image Retreval Using Hashing
Multilabel Image Retreval Using HashingMultilabel Image Retreval Using Hashing
Multilabel Image Retreval Using Hashing
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural NetworkTargeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Deep learning in Computer Vision
Deep learning in Computer VisionDeep learning in Computer Vision
Deep learning in Computer Vision
Survey of The Problem of Object Detection In Real Images
Survey of The Problem of Object Detection In Real ImagesSurvey of The Problem of Object Detection In Real Images
Survey of The Problem of Object Detection In Real Images
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
Paper of Final Year Project.pdf
Paper of Final Year Project.pdfPaper of Final Year Project.pdf
Paper of Final Year Project.pdf
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
Obscenity Detection in Images
Obscenity Detection in ImagesObscenity Detection in Images
Obscenity Detection in Images


GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
Biological classification of plants with detail
Biological classification of plants with detailBiological classification of plants with detail
Biological classification of plants with detailhaiderbaloch3
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
Ai in communication electronicss[1].pptx
Ai in communication electronicss[1].pptxAi in communication electronicss[1].pptx
Ai in communication electronicss[1].pptxsubscribeus100
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫qfactory1
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju

Dernier (20)

GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
Biological classification of plants with detail
Biological classification of plants with detailBiological classification of plants with detail
Biological classification of plants with detail
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
Ai in communication electronicss[1].pptx
Ai in communication electronicss[1].pptxAi in communication electronicss[1].pptx
Ai in communication electronicss[1].pptx
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf


  • 1. 1 Mining Web Images for Concept Learning A work by Eren Golge Advisor: Asst. Prof. Dr. Pinar Duygulu
  • 2. 2 Motivation ● Problem – hard to have large annotated images. ● Solution – Use weakly-labeled images from Internet ● But – polysemy and irrelevancy in Internet images – visual variations of targeted concepts (sub-modularity) ● Then – Use our methods CMAP and AME :)
  • 4. 4 Meta Pipeline GATHER DATA from Refine DATA Learn Classifiers Polysemy, Irrelevancy, Sub-Grouping Weakly Labelled Images High quality Concept Models
  • 5. 5 Short Retrospective ● Use annotated control set as a start point. – Fergus et. al. [1], OPTIMOL, Li and Fei-Fei [2] – We use fully autonomous framework. ● Use Textual Captions – Berg and Forsyth [3] – We use only visual content ● Discriminative image cues – Efros et al. [4] “Discriminative Patches”, Q. Li et al.[5] – We use single computer with faster and better results. ● CMAP and AME have broader possible applications [1] Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: Computer Vision, 2005. ICCV 2005 [2] Berg, T.L., Berg, A.C., Edwards, J., Maire, M., White, R., Teh, Y.W., Learned-Miller, E.G., Forsyth, D.A.: Names and faces in the news. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR). Volume 2. (2004) 848–854 [3] Li, L.J., Fei-Fei, L.: Optimol: automatic online picture collection via incremental model learning. International journal of computer vision 88(2) (2010) 147–168 [4] Li, Q., Wu, J., & Tu, Z. (n.d.). Harvesting Mid-level Visual Concepts from Large-scale Internet Images.
  • 7. 7 Method #1 : CMAP Clustering Outlier detection+ Concept Map - CMAP Accepted for Draft version : Polysemy and Sub-Grouping Irrelevancy
  • 8. 8 CMAP's motivation ● Very Generic method for other domains as well (textual, biological etc.) ● Extension of SOM (a.k.a. Kohonen's Map) * ● Inspired by biological phenomenas ** ● Able to cluster data and detect outliers ● Irrelevancy and Sub-Grouping SOLVED!! *Kohonen, T.: Self-organizing maps. Springer (1997) **Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal of physiology 160(1) (1962) 106 Outlier clusters Outlier instances in salient clusters
  • 9. 9 CMAP cont' finding outlier units ● Look activation statistics of each SOM unit in learning phase ● Latter learning iterations are more reliable IF a unit is activated REARLY → OUTLIER FREQUENTLY → SALIENT Winner activations Neighbor activations
  • 10. 10 CMAP cont' finding sole outliers x x x x
  • 13. 13 Learning Models ● Learn L1 linear SVM models – Easier to train – Better for high dimensional data (wide data matrix) – Implicit feature selection by L1 norm ● Learn one linear model from each salient cluster ● Each concept has multiple models – Polysemy SOLVED!!
  • 15. 15 via CMAP ● Learn Attributes ● Learn Faces ● Learn Scenes ● Learn Objects ● Improve Detection Performance
  • 16. 16 Experiments ● Only use images for learning ● Attack to problems: – Attribute Learning : [1] , Images, [2], [2] ● Learn Texture and Color attributes – Scene Learning : MIT-indoor [4], Scene-15 [5] ● Use learned Attributes as mid-level features – Face Recognition : FAN-Large [6] – Object Recognition : Google data-set [3] – Faster Object Detection: Enhancement over Selective Search [7] [1] Russakovsky, O., Fei-Fei, L.: Attribute learning in large-scale datasets. In: Trends and Topics in Computer Vision. Springer (2012) [2] Van De Weijer, J., Schmid, C., Verbeek, J., Larlus, D.: Learning color names for real-world applications. Image Processing, IEEE (2009) [3] Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: Computer Vision, 2005. ICCV 2005 [4] Quattoni, A., Torralba, A.: Recognizing indoor scenes. CVPR (2009) [5] Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. CVPR 2006 [6] Ozcan, M., Luo, J., Ferrari, V., Caputo, B.: A large-scale database of images and captions for automatic face naming. In: BMVC. (2011) [7] Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer vision 104.2 (2013): 154-171.
  • 18. 18
  • 19. 19 Visual Examples # Faces Salient Clusters Outlier Clusters Outlier Instances
  • 20. 20Salient Clusters Outlier Clusters Outlier Instances
  • 21. 21 Implementation Details ● Visual Features : – BoW SIFT with 4000 words (for texture attribute, object and face) – Use 3D 10x20x20 Lab Histograms (for attribute) – 256 dimensional LBP [1] (for object and face) ● Preprocessing – Attribute: Extract random 100x100 non-overlapping image patches from each image. – Scene: Represent each image with the confidence scores of attribute classifiers in a Spatial Pyramid sense – Face: Apply face detection[2] to each image and get one highest score patch. – Object: Apply unsupervised saliency detection [3] to images and get a single highest activation region. ● Model Learning – Use outliers and some sample of other concept instances as Negative set – Apply Hard Mining[4] – Tune all hyper parameters via X-validation on the (classifiers and RSOM parameters) ● NOTICE: – We use Google images to train concept models and deal with DOMAIN ADAPTATION [1] Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Analysis and Machine Intelligence, IEEE Transactions on 24(7) (2002) 971–987 [2] Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE (2012) 2879–2886 [3] Erdem, E., Erdem, A.: Visual saliency estimation by nonlinearly integrating features using region covariances. Journal of Vision 13(4) (2013) 1–20 [4] Felzenszwalb, Pedro F., et al. "Object detection with discriminatively trained part-based models." Pattern Analysis and Machine Intelligence, IEEE Transactions on 32.9 (2010): 1627-1645.
  • 23. 23 Attribute Learning Ours State of art Attribute Image-Net 0.37 0.36 [4] Attribute ebay 0.81 0.79 [3] Attribute bing 0.82 - [3] Van De Weijer, J., Schmid, C., Verbeek, J., Larlus, D.: Learning color names for real-world applications. Image Processing, IEEE (2009) [4] Russakovsky, O., Fei-Fei, L.: Attribute learning in large-scale datasets. In: Trends and Topics in Computer Vision. Springer (2012) [5]O. Russakovsky and L. Fei-Fei, “Attribute learning in large-scale datasets,” in Trends and Topics in Computer Vision, pp. 1–14, Springer, 2012.
  • 24. 24 Scene Learning MIT-indoor Scene-15 CMAP-A 46.2% 82.7% CMAP-S 40.8% 80.7% CMAP-S+HM 41.7% 81.3% Li et al. [1] 47.6% 82.1% Pandey et al. [82] 43.1% - Kwitt et al. [3] 44% 82.3% Lazebnik et al. [4] - 81% Singh et al. [5] 38% 77% CMAP-A : Attribute based Scene Learning. CMAP-S : Scenes Learning from directly CMAP. CMAP-S+HM : Scene Learning from CMAP with hard mining. [1] Q. Li, J. Wu, and Z. Tu, “Harvesting mid-level visual concepts from large-scale internet images,” CVPR, 2013. [2] M. Pandey and S. Lazebnik, “Scene recognition and weakly supervised object localization with deformable part-based models,” ICCV, 2011. [3] R. Kwitt, N. Vasconcelos, and N. Rasiwasia, “Scene recognition on the semantic manifold,” ECCV, 2012. [4] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” in Computer Vision and Pattern Recognition, 2006” [5] S. Singh, A. Gupta, and A. A. Efros, “Unsupervised discovery of mid-level discriminative patches,” in European Conference Computer Vision (ECCV), 2012.
  • 25. 25 Object Learning CMAP [1] [2] CMAP [1] [2] airplane 0.63 0.51 0.76 car 0.97 0.98 0.94 face 0.67 0.52 0.82 guitar 0.89 0.81 0.60 leopard 0.76 0.74 0.89 motorbike 0.98 0.98 0.67 watch 0.55 0.48 0.53 overall 0.78 0.72 0.75 [1] R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman, “Learning object categories from google’s image search,” in Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, vol. 2, pp. 1816–1823, IEEE, 2005. [2] L.-J. Li and L. Fei-Fei, “Optimol: automatic online picture collection via incremental model learning,” International journal of computer vision, vol. 88, no. 2, pp. 147–168, 2010. → Dataset provided by [1]
  • 26. 26 Face Learning GBC+CF(half)[1] CMAP-1 CMAP-2 Baseline EASY 0.58 0.63 0.66 0.31 HARD 0.32 0.34 0.38 0.18 → Face learning results with detecting faces using OpenCV detector [1] M. Ozcan, J. Luo, V. Ferrari, and B. Caputo, “A large-scale database of images and captions for automatic face naming.,” in BMVC, pp. 1–11, 2011.
  • 27. 27 Selective Search From “Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer vision 104.2 (2013): 154-171.”
  • 28. 28 Selective Search with CMAP ● Remove outlier candidate regions from the detection tree of Selective Search[1] ● ~ 3.500 lower candidate region per image with better Recall and MABO*. [1] Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer vision 104.2 (2013): 154-171. [2] B. Alexe, T. Deselaers, and V. Ferrari, “Measuring the objectness of image windows,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 34, no. 11, pp. 2189–2202, 2012. MABO Recall No. of Windows Objectness [2] 0.69 0.94 1.853 Selective Search [1] 0.87 0.99 10.097 Selective Search + CMAP 0.89 0.99 6.753 * MABO : Mean Average Best Overlap.
  • 29. 29 Results Summary Ours State of art Face 0.66 0.58 [1] Scene 0.47 0.48 [5] Object 0.78 0.75 [2] Attribute Image-Net 0.37 0.36 [3] Attribute ebay 0.81 0.79 [4] Attribute bing 0.82 - [1] Ozcan, M., Luo, J., Ferrari, V., Caputo, B.: A large-scale database of images and captions for automatic face naming. BMVC. (2011) [2] Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: Computer Vision, 2005. ICCV 2005 [3] Russakovsky, O., Fei-Fei, L.: Attribute learning in large-scale datasets. In: Trends and Topics in Computer Vision. Springer (2012) [4] Van De Weijer, J., Schmid, C., Verbeek, J., Larlus, D.: Learning color names for real-world applications. Image Processing, IEEE (2009) [5] Li, Q., Wu, J., Tu, Z.: Harvesting mid-level visual concepts from large-scale internet images. CVPR (2013)
  • 30. 30 METHOD #2 AME – Association through Model Evolution
  • 31. 31 Method#2:AME (Association through Model Evolution) ● Iterative data cleansing ● Measure discriminativeness and representativeness. ● Define category versus random instances.
  • 32. 32 AME's motivation ● One another agnostic data refining method against Irrelevancy. ● Make use of far too much random instances as oppose to limited annotated instances. ● Evade Sub-Grouping using very high dimensional representations.
  • 33. 33 AME's method overview ● First discern category candidates (CC) from random set (RS). ● Define category references(CR). ● Second discern CR from CC. ● Define spurious instances (SI) against CR and eliminate. ● Re-Iterate Irrelevancy Solved!!
  • 34. 34 Step1 ● Discerning category from random set – Learn a linear model M1 between CC and RS. – Take the most confidently classified instances as the CR.
  • 35. 35 Step2 ● Discerning category references from others – Learn linear model M2 between CR and others. M1 M2
  • 36. 36 Step3 ● Define SI against CR. ● Eliminate SI. M1 x x x x x
  • 38. 38
  • 39. 39
  • 40. 40 High Dimensional Representation ● Our problem is discrimination. ● Therefore ; – High dimensions makes any category linearly separable from others despite of category modularity. sub-grouping solved !!
  • 41. 41 Feature Learning ● Learn frequent pattern on the data ● Learning Pipeline (similar to [1]); 1. Scrap random nxn patches from the images. Over Collected Patches; 2. Contrast normalization 3. ZCA Whitening 4. K-means for C words Over Whole Image; 5. Spatial (Max or Avg) Pooling by C words Learned Visual Words [1] Coates, Adam, Andrew Y. Ng, and Honglak Lee. "An analysis of single-layer networks in unsupervised feature learning." International Conference on Artificial Intelligence and Statistics. 2011. ={ 5 x C words } dimension for each img
  • 43. 43 Implementation Details ● AME – L1 Logistic Regression with Gauss-Seidel algorithm [1] – Final model L1 Linear SVM with Grafting[2]. – At each iteration 5 images are eliminated. ● Feature Learning – Use horizontally flipped images. – Re-size each gray-level image 60px height. – Contrast Normalization to random patches. – ZCA whitening with Ɛ=0.5. – Receptive field size 6x6 pixels – 1 px stride with 2400 words [1] Shirish Krishnaj Shevade and S Sathiya Keerthi. A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics,19(17):2246–2253, 2003. [2] Simon Perkins, Kevin Lacker, and James Theiler. Grafting: Fast, incremental feature selection by gradient descent in function space. The Journal of Machine Learning Research, 3:1333–1356, 2003.
  • 45. 45 DATASETS ● FAN-Large [2] – EASY subset: faces larger than 60x70 px, 138 categories. – ALL: no constraint, 365 categories. ● PubFig83[1] – Subset of PubFig with 83 celebrities at least 100 images for each. [1]N. Pinto, Z. Stone, T. Zickler, and D. Cox, “Scaling up biologically-inspired computer vision: A case study in unconstrained face recognition on facebook,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on, pp. 35–42, IEEE, 2011. [2] M. Ozcan, J. Luo, V. Ferrari, and B. Caputo, “A large-scale database of images and captions for automatic face naming.,” in BMVC, pp. 1–11, 2011.
  • 46. 46 Classification Pipeline ● No data refining ● Models are trained on the training set of the given dataset ● Results on PubFig83 [1]N. Pinto, Z. Stone, T. Zickler, and D. Cox, “Scaling up biologically-inspired computer vision: A case study in unconstrained face recognition on facebook,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on, pp. 35–42, IEEE, 2011 [2]B. C. Becker and E. G. Ortiz, “Evaluating open-universe face identification on the web,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on, pp. 904–911, IEEE, 2013. ● ~5% improvement on State of Art ● Better results with more words
  • 47. 47 AME results ● Baseline is the same classification pipeline without any data refining. ● All models are learned from web images. [18] S. Singh, A. Gupta, and A. A. Efros, “Unsupervised discovery of mid-level discriminative patches,” in European Conference Computer Vision (ECCV), 2012.
  • 48. 48 AME-- False vs True Elimination → Incremental plot of correct versus false outlier detections until AME finds all the outliers for all classes. Each iteration values are aggregated by the previous iteration.
  • 49. 49 AME-- X-val Accuracies → Cross-validation (final-model) and M1 accuracies as the algorithm proceeds. This shows a salient correlation between cross-validation classifier and M1 models, without M1 models incurring over-fitting.
  • 50. 50 AME-- # Elimination vs Accuracy → Effect of number of outliers removed at each iteration versus final test accuracy. It is observed that elimination after some limit imposes degradation of final performance and eliminating 1 instance per iteration is the salient selection without any sanity check.
  • 52. 52 Which one to Choose? ● Polysemy + Irrelevancy in the DATA CMAP→ ● Only Irrelevancy in the DATA AME→ ● Another choice: – Use AME first then CMAP – Not testified !
  • 53. 53 At the End ● We propose two novel algorithms CMAP and AME ● Compelling results against state-of-art methods for variety of Vision Tasks ● Learn complex visual concepts with a simple query.
  • 54. 54 That's the end ... Thanks for your valuable time :)