SlideShare une entreprise Scribd logo
1  sur  54
Télécharger pour lire hors ligne
1
Mining Web Images for Concept Learning
A work by Eren Golge
Advisor: Asst. Prof. Dr. Pinar Duygulu
2
Motivation
●
Problem
– hard to have large annotated images.
●
Solution
– Use weakly-labeled images from Internet
●
But
– polysemy and irrelevancy in Internet images
– visual variations of targeted concepts (sub-modularity)
●
Then
– Use our methods CMAP and AME :)
3
Hassles
Data size
Model learning
Sub-Grouping
Irrelevancy
Polysemy
4
Meta Pipeline
GATHER DATA from
Refine DATA
Learn Classifiers
Polysemy, Irrelevancy, Sub-Grouping
Weakly Labelled Images
High quality Concept Models
5
Short Retrospective
●
Use annotated control set as a start point.
– Fergus et. al. [1], OPTIMOL, Li and Fei-Fei [2]
– We use fully autonomous framework.
●
Use Textual Captions
– Berg and Forsyth [3]
– We use only visual content
●
Discriminative image cues
– Efros et al. [4] “Discriminative Patches”, Q. Li et al.[5]
– We use single computer with faster and better results.
●
CMAP and AME have broader possible applications
[1] Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: Computer Vision, 2005. ICCV 2005
[2] Berg, T.L., Berg, A.C., Edwards, J., Maire, M., White, R., Teh, Y.W., Learned-Miller, E.G., Forsyth, D.A.: Names and faces in the news. In: IEEE Conference on
Computer Vision
Pattern Recognition (CVPR). Volume 2. (2004) 848–854
[3] Li, L.J., Fei-Fei, L.: Optimol: automatic online picture collection via incremental model learning. International journal of computer vision 88(2) (2010) 147–168
[4] Li, Q., Wu, J., & Tu, Z. (n.d.). Harvesting Mid-level Visual Concepts from Large-scale Internet Images.
6
Method #1
ConceptMap - CMAP
7
Method #1 : CMAP
Clustering
Outlier detection+
Concept Map - CMAP
Accepted for
Draft version : http://arxiv.org/abs/1312.4384
Polysemy and Sub-Grouping
Irrelevancy
8
CMAP's motivation
●
Very Generic method for other domains as well (textual, biological etc.)
●
Extension of SOM (a.k.a. Kohonen's Map) *
●
Inspired by biological phenomenas **
●
Able to cluster data and detect outliers
●
Irrelevancy and Sub-Grouping SOLVED!!
*Kohonen, T.: Self-organizing maps. Springer (1997)
**Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal
of physiology 160(1) (1962) 106
Outlier clusters
Outlier instances in salient clusters
9
CMAP cont'
finding outlier units
●
Look activation statistics of each SOM unit in
learning phase
●
Latter learning iterations are more reliable
IF a unit is activated
REARLY → OUTLIER
FREQUENTLY → SALIENT
Winner activations Neighbor activations
10
CMAP cont'
finding sole outliers
x
x
x
x
11
CMAP mapping
12
CMAP in action
13
Learning Models
●
Learn L1 linear SVM models
– Easier to train
– Better for high dimensional data
(wide data matrix)
– Implicit feature selection by L1
norm
●
Learn one linear model from each
salient cluster
●
Each concept has multiple models
– Polysemy SOLVED!!
14
CMAP Overview
15
via CMAP
●
Learn Attributes
●
Learn Faces
●
Learn Scenes
●
Learn Objects
●
Improve Detection Performance
16
Experiments
●
Only use images for learning
●
Attack to problems:
– Attribute Learning : [1] , Images, [2],
[2]
●
Learn Texture and Color attributes
– Scene Learning : MIT-indoor [4], Scene-15 [5]
●
Use learned Attributes as mid-level features
– Face Recognition : FAN-Large [6]
– Object Recognition : Google data-set [3]
– Faster Object Detection: Enhancement over Selective Search [7]
[1] Russakovsky, O., Fei-Fei, L.: Attribute learning in large-scale datasets. In: Trends and Topics in Computer Vision. Springer (2012)
[2] Van De Weijer, J., Schmid, C., Verbeek, J., Larlus, D.: Learning color names for real-world applications. Image Processing, IEEE (2009)
[3] Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: Computer Vision, 2005. ICCV 2005
[4] Quattoni, A., Torralba, A.: Recognizing indoor scenes. CVPR (2009)
[5] Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. CVPR 2006
[6] Ozcan, M., Luo, J., Ferrari, V., Caputo, B.: A large-scale database of images and captions for automatic face naming. In: BMVC. (2011)
[7] Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer vision 104.2 (2013): 154-171.
17
Visual Examples
18
19
Visual Examples # Faces
Salient Clusters Outlier Clusters Outlier Instances
20Salient Clusters Outlier Clusters Outlier Instances
21
Implementation Details
●
Visual Features :
– BoW SIFT with 4000 words (for texture attribute, object and face)
– Use 3D 10x20x20 Lab Histograms (for attribute)
– 256 dimensional LBP [1] (for object and face)
●
Preprocessing
– Attribute: Extract random 100x100 non-overlapping image patches from each image.
– Scene: Represent each image with the confidence scores of attribute classifiers in a Spatial Pyramid sense
– Face: Apply face detection[2] to each image and get one highest score patch.
– Object: Apply unsupervised saliency detection [3] to images and get a single highest activation region.
●
Model Learning
– Use outliers and some sample of other concept instances as Negative set
– Apply Hard Mining[4]
– Tune all hyper parameters via X-validation on the (classifiers and RSOM parameters)
●
NOTICE:
– We use Google images to train concept models and deal with DOMAIN ADAPTATION
[1] Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Analysis and
Machine Intelligence, IEEE Transactions on 24(7) (2002) 971–987
[2] Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: Computer Vision and Pattern Recognition (CVPR), 2012
IEEE Conference on, IEEE (2012) 2879–2886
[3] Erdem, E., Erdem, A.: Visual saliency estimation by nonlinearly integrating features using region covariances. Journal of Vision 13(4) (2013) 1–20
[4] Felzenszwalb, Pedro F., et al. "Object detection with discriminatively trained part-based models." Pattern Analysis and Machine Intelligence, IEEE
Transactions on 32.9 (2010): 1627-1645.
22
RESULTS
23
Attribute Learning
Ours State of art
Attribute Image-Net 0.37 0.36 [4]
Attribute ebay 0.81 0.79 [3]
Attribute bing 0.82
-
[3] Van De Weijer, J., Schmid, C., Verbeek, J., Larlus, D.: Learning color names for real-world applications. Image Processing, IEEE (2009)
[4] Russakovsky, O., Fei-Fei, L.: Attribute learning in large-scale datasets. In: Trends and Topics in Computer Vision. Springer (2012)
[5]O. Russakovsky and L. Fei-Fei, “Attribute learning in large-scale datasets,” in Trends and Topics in Computer Vision, pp. 1–14, Springer, 2012.
24
Scene Learning
MIT-indoor Scene-15
CMAP-A 46.2% 82.7%
CMAP-S 40.8% 80.7%
CMAP-S+HM 41.7% 81.3%
Li et al. [1] 47.6% 82.1%
Pandey et al. [82] 43.1% -
Kwitt et al. [3] 44% 82.3%
Lazebnik et al. [4] - 81%
Singh et al. [5] 38% 77%
CMAP-A : Attribute based Scene Learning.
CMAP-S : Scenes Learning from directly CMAP.
CMAP-S+HM : Scene Learning from CMAP with hard mining.
[1] Q. Li, J. Wu, and Z. Tu, “Harvesting mid-level visual concepts from large-scale internet images,” CVPR, 2013.
[2] M. Pandey and S. Lazebnik, “Scene recognition and weakly supervised object localization with deformable part-based models,” ICCV, 2011.
[3] R. Kwitt, N. Vasconcelos, and N. Rasiwasia, “Scene recognition on the semantic manifold,” ECCV, 2012.
[4] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” in Computer Vision
and Pattern Recognition, 2006”
[5] S. Singh, A. Gupta, and A. A. Efros, “Unsupervised discovery of mid-level discriminative patches,” in European Conference Computer Vision (ECCV), 2012.
25
Object Learning
CMAP [1] [2] CMAP [1] [2]
airplane 0.63 0.51 0.76 car 0.97 0.98 0.94
face 0.67 0.52 0.82 guitar 0.89 0.81 0.60
leopard 0.76 0.74 0.89 motorbike 0.98 0.98 0.67
watch 0.55 0.48 0.53 overall 0.78 0.72 0.75
[1] R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman, “Learning object categories from google’s image search,” in Computer
Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, vol. 2, pp. 1816–1823, IEEE, 2005.
[2] L.-J. Li and L. Fei-Fei, “Optimol: automatic online picture collection via incremental model learning,” International journal of
computer vision, vol. 88, no. 2, pp. 147–168, 2010.
→ Dataset provided by [1]
26
Face Learning
GBC+CF(half)[1] CMAP-1 CMAP-2 Baseline
EASY 0.58 0.63 0.66 0.31
HARD 0.32 0.34 0.38 0.18
→ Face learning results with detecting faces using OpenCV detector
[1] M. Ozcan, J. Luo, V. Ferrari, and B. Caputo, “A large-scale database of images and captions for automatic face
naming.,” in BMVC, pp. 1–11, 2011.
27
Selective Search
From “Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer vision 104.2 (2013): 154-171.”
28
Selective Search with CMAP
●
Remove outlier candidate regions from the detection tree
of Selective Search[1]
●
~ 3.500 lower candidate region per image with better
Recall and MABO*.
[1] Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer vision 104.2 (2013): 154-171.
[2] B. Alexe, T. Deselaers, and V. Ferrari, “Measuring the objectness of image windows,” Pattern Analysis and Machine Intelligence, IEEE
Transactions on, vol. 34, no. 11, pp. 2189–2202, 2012.
MABO Recall No. of Windows
Objectness [2] 0.69 0.94 1.853
Selective Search [1] 0.87 0.99 10.097
Selective Search +
CMAP
0.89 0.99 6.753
* MABO : Mean Average Best Overlap.
29
Results Summary
Ours State of art
Face 0.66 0.58 [1]
Scene 0.47 0.48 [5]
Object 0.78 0.75 [2]
Attribute Image-Net 0.37 0.36 [3]
Attribute ebay 0.81 0.79 [4]
Attribute bing 0.82
-
[1] Ozcan, M., Luo, J., Ferrari, V., Caputo, B.: A large-scale database of images and captions for automatic face naming. BMVC. (2011)
[2] Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: Computer Vision, 2005. ICCV 2005
[3] Russakovsky, O., Fei-Fei, L.: Attribute learning in large-scale datasets. In: Trends and Topics in Computer Vision. Springer (2012)
[4] Van De Weijer, J., Schmid, C., Verbeek, J., Larlus, D.: Learning color names for real-world applications. Image Processing, IEEE (2009)
[5] Li, Q., Wu, J., Tu, Z.: Harvesting mid-level visual concepts from large-scale internet images. CVPR (2013)
30
METHOD #2
AME – Association through Model
Evolution
31
Method#2:AME
(Association through Model Evolution)
●
Iterative data cleansing
●
Measure discriminativeness and representativeness.
●
Define category versus random instances.
32
AME's motivation
●
One another agnostic data refining method
against Irrelevancy.
●
Make use of far too much random instances
as oppose to limited annotated instances.
●
Evade Sub-Grouping using very high
dimensional representations.
33
AME's method overview
●
First discern category candidates (CC) from
random set (RS).
●
Define category references(CR).
●
Second discern CR from CC.
●
Define spurious instances (SI) against CR and
eliminate.
●
Re-Iterate
Irrelevancy Solved!!
34
Step1
●
Discerning category from random set
– Learn a linear model M1 between CC and RS.
– Take the most confidently classified instances as the
CR.
35
Step2
●
Discerning category references from others
– Learn linear model M2 between CR and others.
M1
M2
36
Step3
●
Define SI against CR.
●
Eliminate SI.
M1
x
x
x
x
x
37
Visual Examples
38
39
40
High Dimensional Representation
●
Our problem is discrimination.
●
Therefore ;
– High dimensions makes any category linearly
separable from others despite of category
modularity.
sub-grouping solved !!
41
Feature Learning
●
Learn frequent pattern on the data
●
Learning Pipeline (similar to [1]);
1. Scrap random nxn patches from the images.
Over Collected Patches;
2. Contrast normalization
3. ZCA Whitening
4. K-means for C words
Over Whole Image;
5. Spatial (Max or Avg) Pooling by C words
Learned Visual Words
[1] Coates, Adam, Andrew Y. Ng, and Honglak Lee. "An analysis of single-layer networks in unsupervised feature learning." International Conference on
Artificial Intelligence and Statistics. 2011.
={ 5 x C words }
dimension for each img
42
AME overview
43
Implementation Details
●
AME
– L1 Logistic Regression with Gauss-Seidel algorithm [1]
– Final model L1 Linear SVM with Grafting[2].
– At each iteration 5 images are eliminated.
●
Feature Learning
– Use horizontally flipped images.
– Re-size each gray-level image 60px height.
– Contrast Normalization to random patches.
– ZCA whitening with Ɛ=0.5.
– Receptive field size 6x6 pixels
– 1 px stride with 2400 words
[1] Shirish Krishnaj Shevade and S Sathiya Keerthi. A simple and efficient algorithm for gene selection using sparse logistic regression.
Bioinformatics,19(17):2246–2253, 2003.
[2] Simon Perkins, Kevin Lacker, and James Theiler. Grafting: Fast, incremental feature selection by gradient descent in function space. The Journal of
Machine Learning Research, 3:1333–1356, 2003.
44
RESULTS
45
DATASETS
●
FAN-Large [2]
– EASY subset: faces larger than 60x70 px, 138
categories.
– ALL: no constraint, 365 categories.
●
PubFig83[1]
– Subset of PubFig with 83 celebrities at least 100
images for each.
[1]N. Pinto, Z. Stone, T. Zickler, and D. Cox, “Scaling up biologically-inspired computer vision: A case study in unconstrained face
recognition on facebook,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE
Computer Society Conference on, pp. 35–42, IEEE, 2011.
[2] M. Ozcan, J. Luo, V. Ferrari, and B. Caputo, “A large-scale database of images and captions for automatic face naming.,” in
BMVC, pp. 1–11, 2011.
46
Classification Pipeline
● No data refining
● Models are trained on the training set of the given dataset
● Results on PubFig83
[1]N. Pinto, Z. Stone, T. Zickler, and D. Cox, “Scaling up biologically-inspired computer vision: A case
study in unconstrained face recognition on facebook,” in Computer Vision and Pattern Recognition
Workshops (CVPRW), 2011 IEEE Computer Society Conference on, pp. 35–42, IEEE, 2011
[2]B. C. Becker and E. G. Ortiz, “Evaluating open-universe face identification on the web,” in Computer
Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on, pp. 904–911, IEEE,
2013.
● ~5% improvement on State of Art
● Better results with more words
47
AME results
● Baseline is the same classification pipeline without any data refining.
● All models are learned from web images.
[18] S. Singh, A. Gupta, and A. A. Efros, “Unsupervised discovery of mid-level discriminative patches,” in
European Conference Computer Vision (ECCV), 2012.
48
AME-- False vs True Elimination
→ Incremental plot of correct versus false outlier detections until AME finds all the outliers
for all classes. Each iteration values are aggregated by the previous iteration.
49
AME-- X-val Accuracies
→ Cross-validation (final-model) and M1 accuracies as the algorithm proceeds.
This shows a salient correlation between cross-validation classifier and M1 models,
without M1 models incurring over-fitting.
50
AME-- # Elimination vs Accuracy
→ Effect of number of outliers removed at each iteration versus final test accuracy.
It is observed that elimination after some limit imposes degradation of final
performance and eliminating 1 instance per iteration is the salient selection without
any sanity check.
51
Final Words
52
Which one to Choose?
●
Polysemy + Irrelevancy in the DATA CMAP→
●
Only Irrelevancy in the DATA AME→
●
Another choice:
– Use AME first then CMAP
– Not testified !
53
At the End
●
We propose two novel algorithms CMAP and
AME
●
Compelling results against state-of-art
methods for variety of Vision Tasks
●
Learn complex visual concepts with a simple
query.
54
That's the end ... Thanks for
your valuable time :)

Contenu connexe

Tendances

Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General AudiencesSangwoo Mo
 
Visual Object Tracking: review
Visual Object Tracking: reviewVisual Object Tracking: review
Visual Object Tracking: reviewDmytro Mishkin
 
Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1Suvadip Shome
 
3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection
3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection
3D Dynamic Facial Sequences Analsysis for face recognition and emotion detectionTaleb ALASHKAR
 
Face Detection techniques
Face Detection techniquesFace Detection techniques
Face Detection techniquesAbhineet Bhamra
 
Face and Eye Detection Varying Scenarios With Haar Classifier_2015
Face and Eye Detection Varying Scenarios With Haar Classifier_2015Face and Eye Detection Varying Scenarios With Haar Classifier_2015
Face and Eye Detection Varying Scenarios With Haar Classifier_2015Showrav Mazumder
 
Face Recognition Methods based on Convolutional Neural Networks
Face Recognition Methods based on Convolutional Neural NetworksFace Recognition Methods based on Convolutional Neural Networks
Face Recognition Methods based on Convolutional Neural NetworksElaheh Rashedi
 
Object tracking a survey
Object tracking a surveyObject tracking a survey
Object tracking a surveyHaseeb Hassan
 
Talk 8-Kevin-Imagej2
Talk 8-Kevin-Imagej2 Talk 8-Kevin-Imagej2
Talk 8-Kevin-Imagej2 Jay Thakkar
 
Face Recognition Based on Deep Learning (Yurii Pashchenko Technology Stream)
Face Recognition Based on Deep Learning (Yurii Pashchenko Technology Stream) Face Recognition Based on Deep Learning (Yurii Pashchenko Technology Stream)
Face Recognition Based on Deep Learning (Yurii Pashchenko Technology Stream) IT Arena
 
Moving object detection
Moving object detectionMoving object detection
Moving object detectionManav Mittal
 
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamScene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamWithTheBest
 
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)Universitat Politècnica de Catalunya
 
IRJET- Face Counter using Matlab
IRJET-  	  Face Counter using MatlabIRJET-  	  Face Counter using Matlab
IRJET- Face Counter using MatlabIRJET Journal
 
Video surveillance Moving object detection& tracking Chapter 1
Video surveillance Moving object detection& tracking Chapter 1 Video surveillance Moving object detection& tracking Chapter 1
Video surveillance Moving object detection& tracking Chapter 1 ahmed mokhtar
 
Image Processing Introduction
Image Processing IntroductionImage Processing Introduction
Image Processing IntroductionAhmed Gad
 
Project Face Detection
Project Face Detection Project Face Detection
Project Face Detection Abu Saleh Musa
 

Tendances (20)

Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
 
Visual Object Tracking: review
Visual Object Tracking: reviewVisual Object Tracking: review
Visual Object Tracking: review
 
Image recognition
Image recognitionImage recognition
Image recognition
 
Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1
 
3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection
3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection
3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection
 
Object detection
Object detectionObject detection
Object detection
 
Face Detection techniques
Face Detection techniquesFace Detection techniques
Face Detection techniques
 
Face and Eye Detection Varying Scenarios With Haar Classifier_2015
Face and Eye Detection Varying Scenarios With Haar Classifier_2015Face and Eye Detection Varying Scenarios With Haar Classifier_2015
Face and Eye Detection Varying Scenarios With Haar Classifier_2015
 
Face Recognition Methods based on Convolutional Neural Networks
Face Recognition Methods based on Convolutional Neural NetworksFace Recognition Methods based on Convolutional Neural Networks
Face Recognition Methods based on Convolutional Neural Networks
 
Report
ReportReport
Report
 
Object tracking a survey
Object tracking a surveyObject tracking a survey
Object tracking a survey
 
Talk 8-Kevin-Imagej2
Talk 8-Kevin-Imagej2 Talk 8-Kevin-Imagej2
Talk 8-Kevin-Imagej2
 
Face Recognition Based on Deep Learning (Yurii Pashchenko Technology Stream)
Face Recognition Based on Deep Learning (Yurii Pashchenko Technology Stream) Face Recognition Based on Deep Learning (Yurii Pashchenko Technology Stream)
Face Recognition Based on Deep Learning (Yurii Pashchenko Technology Stream)
 
Moving object detection
Moving object detectionMoving object detection
Moving object detection
 
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamScene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
 
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)
 
IRJET- Face Counter using Matlab
IRJET-  	  Face Counter using MatlabIRJET-  	  Face Counter using Matlab
IRJET- Face Counter using Matlab
 
Video surveillance Moving object detection& tracking Chapter 1
Video surveillance Moving object detection& tracking Chapter 1 Video surveillance Moving object detection& tracking Chapter 1
Video surveillance Moving object detection& tracking Chapter 1
 
Image Processing Introduction
Image Processing IntroductionImage Processing Introduction
Image Processing Introduction
 
Project Face Detection
Project Face Detection Project Face Detection
Project Face Detection
 

Similaire à Eren_Golge_MS_Thesis_2014

Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple featuresHirantha Pradeep
 
Long-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep LearningLong-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep LearningElaheh Rashedi
 
Introduction talk to Computer Vision
Introduction talk to Computer Vision Introduction talk to Computer Vision
Introduction talk to Computer Vision Chen Sagiv
 
Introduction to Face Processing with Computer Vision
Introduction to Face Processing with Computer VisionIntroduction to Face Processing with Computer Vision
Introduction to Face Processing with Computer VisionAll Things Open
 
Real-time Face Detection and Recognition
Real-time Face Detection and RecognitionReal-time Face Detection and Recognition
Real-time Face Detection and RecognitionJia-Bin Huang
 
Visual Saliency Model Using Sift and Comparison of Learning Approaches
Visual Saliency Model Using Sift and Comparison of Learning ApproachesVisual Saliency Model Using Sift and Comparison of Learning Approaches
Visual Saliency Model Using Sift and Comparison of Learning Approachescsandit
 
Computer vision introduction
Computer vision  introduction Computer vision  introduction
Computer vision introduction Wael Badawy
 
Facial emotion detection on babies' emotional face using Deep Learning.
Facial emotion detection on babies' emotional face using Deep Learning.Facial emotion detection on babies' emotional face using Deep Learning.
Facial emotion detection on babies' emotional face using Deep Learning.Takrim Ul Islam Laskar
 
Learning with Relative Attributes
Learning with Relative AttributesLearning with Relative Attributes
Learning with Relative AttributesVikas Jain
 
Multilabel Image Retreval Using Hashing
Multilabel Image Retreval Using HashingMultilabel Image Retreval Using Hashing
Multilabel Image Retreval Using HashingSurbhi Bhosale
 
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural NetworkTargeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Networkijceronline
 
Deep learning in Computer Vision
Deep learning in Computer VisionDeep learning in Computer Vision
Deep learning in Computer VisionDavid Dao
 
Survey of The Problem of Object Detection In Real Images
Survey of The Problem of Object Detection In Real ImagesSurvey of The Problem of Object Detection In Real Images
Survey of The Problem of Object Detection In Real ImagesCSCJournals
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...Tulipp. Eu
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - Hiroshi Fukui
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)Fellowship at Vodafone FutureLab
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)Fellowship at Vodafone FutureLab
 
Obscenity Detection in Images
Obscenity Detection in ImagesObscenity Detection in Images
Obscenity Detection in ImagesAnil Kumar Gupta
 

Similaire à Eren_Golge_MS_Thesis_2014 (20)

Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple features
 
Long-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep LearningLong-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep Learning
 
Introduction talk to Computer Vision
Introduction talk to Computer Vision Introduction talk to Computer Vision
Introduction talk to Computer Vision
 
Introduction to Face Processing with Computer Vision
Introduction to Face Processing with Computer VisionIntroduction to Face Processing with Computer Vision
Introduction to Face Processing with Computer Vision
 
Real-time Face Detection and Recognition
Real-time Face Detection and RecognitionReal-time Face Detection and Recognition
Real-time Face Detection and Recognition
 
Visual Saliency Model Using Sift and Comparison of Learning Approaches
Visual Saliency Model Using Sift and Comparison of Learning ApproachesVisual Saliency Model Using Sift and Comparison of Learning Approaches
Visual Saliency Model Using Sift and Comparison of Learning Approaches
 
Computer vision introduction
Computer vision  introduction Computer vision  introduction
Computer vision introduction
 
Real time facial expression analysis using pca
Real time facial expression analysis using pcaReal time facial expression analysis using pca
Real time facial expression analysis using pca
 
Facial emotion detection on babies' emotional face using Deep Learning.
Facial emotion detection on babies' emotional face using Deep Learning.Facial emotion detection on babies' emotional face using Deep Learning.
Facial emotion detection on babies' emotional face using Deep Learning.
 
Learning with Relative Attributes
Learning with Relative AttributesLearning with Relative Attributes
Learning with Relative Attributes
 
Multilabel Image Retreval Using Hashing
Multilabel Image Retreval Using HashingMultilabel Image Retreval Using Hashing
Multilabel Image Retreval Using Hashing
 
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural NetworkTargeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
 
Deep learning in Computer Vision
Deep learning in Computer VisionDeep learning in Computer Vision
Deep learning in Computer Vision
 
Survey of The Problem of Object Detection In Real Images
Survey of The Problem of Object Detection In Real ImagesSurvey of The Problem of Object Detection In Real Images
Survey of The Problem of Object Detection In Real Images
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
 
Paper of Final Year Project.pdf
Paper of Final Year Project.pdfPaper of Final Year Project.pdf
Paper of Final Year Project.pdf
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
 
Obscenity Detection in Images
Obscenity Detection in ImagesObscenity Detection in Images
Obscenity Detection in Images
 

Dernier

GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Biological classification of plants with detail
Biological classification of plants with detailBiological classification of plants with detail
Biological classification of plants with detailhaiderbaloch3
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Ai in communication electronicss[1].pptx
Ai in communication electronicss[1].pptxAi in communication electronicss[1].pptx
Ai in communication electronicss[1].pptxsubscribeus100
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫qfactory1
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 

Dernier (20)

GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Biological classification of plants with detail
Biological classification of plants with detailBiological classification of plants with detail
Biological classification of plants with detail
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
AZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTXAZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTX
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Ai in communication electronicss[1].pptx
Ai in communication electronicss[1].pptxAi in communication electronicss[1].pptx
Ai in communication electronicss[1].pptx
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 

Eren_Golge_MS_Thesis_2014

  • 1. 1 Mining Web Images for Concept Learning A work by Eren Golge Advisor: Asst. Prof. Dr. Pinar Duygulu
  • 2. 2 Motivation ● Problem – hard to have large annotated images. ● Solution – Use weakly-labeled images from Internet ● But – polysemy and irrelevancy in Internet images – visual variations of targeted concepts (sub-modularity) ● Then – Use our methods CMAP and AME :)
  • 4. 4 Meta Pipeline GATHER DATA from Refine DATA Learn Classifiers Polysemy, Irrelevancy, Sub-Grouping Weakly Labelled Images High quality Concept Models
  • 5. 5 Short Retrospective ● Use annotated control set as a start point. – Fergus et. al. [1], OPTIMOL, Li and Fei-Fei [2] – We use fully autonomous framework. ● Use Textual Captions – Berg and Forsyth [3] – We use only visual content ● Discriminative image cues – Efros et al. [4] “Discriminative Patches”, Q. Li et al.[5] – We use single computer with faster and better results. ● CMAP and AME have broader possible applications [1] Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: Computer Vision, 2005. ICCV 2005 [2] Berg, T.L., Berg, A.C., Edwards, J., Maire, M., White, R., Teh, Y.W., Learned-Miller, E.G., Forsyth, D.A.: Names and faces in the news. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR). Volume 2. (2004) 848–854 [3] Li, L.J., Fei-Fei, L.: Optimol: automatic online picture collection via incremental model learning. International journal of computer vision 88(2) (2010) 147–168 [4] Li, Q., Wu, J., & Tu, Z. (n.d.). Harvesting Mid-level Visual Concepts from Large-scale Internet Images.
  • 7. 7 Method #1 : CMAP Clustering Outlier detection+ Concept Map - CMAP Accepted for Draft version : http://arxiv.org/abs/1312.4384 Polysemy and Sub-Grouping Irrelevancy
  • 8. 8 CMAP's motivation ● Very Generic method for other domains as well (textual, biological etc.) ● Extension of SOM (a.k.a. Kohonen's Map) * ● Inspired by biological phenomenas ** ● Able to cluster data and detect outliers ● Irrelevancy and Sub-Grouping SOLVED!! *Kohonen, T.: Self-organizing maps. Springer (1997) **Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal of physiology 160(1) (1962) 106 Outlier clusters Outlier instances in salient clusters
  • 9. 9 CMAP cont' finding outlier units ● Look activation statistics of each SOM unit in learning phase ● Latter learning iterations are more reliable IF a unit is activated REARLY → OUTLIER FREQUENTLY → SALIENT Winner activations Neighbor activations
  • 10. 10 CMAP cont' finding sole outliers x x x x
  • 13. 13 Learning Models ● Learn L1 linear SVM models – Easier to train – Better for high dimensional data (wide data matrix) – Implicit feature selection by L1 norm ● Learn one linear model from each salient cluster ● Each concept has multiple models – Polysemy SOLVED!!
  • 15. 15 via CMAP ● Learn Attributes ● Learn Faces ● Learn Scenes ● Learn Objects ● Improve Detection Performance
  • 16. 16 Experiments ● Only use images for learning ● Attack to problems: – Attribute Learning : [1] , Images, [2], [2] ● Learn Texture and Color attributes – Scene Learning : MIT-indoor [4], Scene-15 [5] ● Use learned Attributes as mid-level features – Face Recognition : FAN-Large [6] – Object Recognition : Google data-set [3] – Faster Object Detection: Enhancement over Selective Search [7] [1] Russakovsky, O., Fei-Fei, L.: Attribute learning in large-scale datasets. In: Trends and Topics in Computer Vision. Springer (2012) [2] Van De Weijer, J., Schmid, C., Verbeek, J., Larlus, D.: Learning color names for real-world applications. Image Processing, IEEE (2009) [3] Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: Computer Vision, 2005. ICCV 2005 [4] Quattoni, A., Torralba, A.: Recognizing indoor scenes. CVPR (2009) [5] Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. CVPR 2006 [6] Ozcan, M., Luo, J., Ferrari, V., Caputo, B.: A large-scale database of images and captions for automatic face naming. In: BMVC. (2011) [7] Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer vision 104.2 (2013): 154-171.
  • 18. 18
  • 19. 19 Visual Examples # Faces Salient Clusters Outlier Clusters Outlier Instances
  • 20. 20Salient Clusters Outlier Clusters Outlier Instances
  • 21. 21 Implementation Details ● Visual Features : – BoW SIFT with 4000 words (for texture attribute, object and face) – Use 3D 10x20x20 Lab Histograms (for attribute) – 256 dimensional LBP [1] (for object and face) ● Preprocessing – Attribute: Extract random 100x100 non-overlapping image patches from each image. – Scene: Represent each image with the confidence scores of attribute classifiers in a Spatial Pyramid sense – Face: Apply face detection[2] to each image and get one highest score patch. – Object: Apply unsupervised saliency detection [3] to images and get a single highest activation region. ● Model Learning – Use outliers and some sample of other concept instances as Negative set – Apply Hard Mining[4] – Tune all hyper parameters via X-validation on the (classifiers and RSOM parameters) ● NOTICE: – We use Google images to train concept models and deal with DOMAIN ADAPTATION [1] Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Analysis and Machine Intelligence, IEEE Transactions on 24(7) (2002) 971–987 [2] Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE (2012) 2879–2886 [3] Erdem, E., Erdem, A.: Visual saliency estimation by nonlinearly integrating features using region covariances. Journal of Vision 13(4) (2013) 1–20 [4] Felzenszwalb, Pedro F., et al. "Object detection with discriminatively trained part-based models." Pattern Analysis and Machine Intelligence, IEEE Transactions on 32.9 (2010): 1627-1645.
  • 23. 23 Attribute Learning Ours State of art Attribute Image-Net 0.37 0.36 [4] Attribute ebay 0.81 0.79 [3] Attribute bing 0.82 - [3] Van De Weijer, J., Schmid, C., Verbeek, J., Larlus, D.: Learning color names for real-world applications. Image Processing, IEEE (2009) [4] Russakovsky, O., Fei-Fei, L.: Attribute learning in large-scale datasets. In: Trends and Topics in Computer Vision. Springer (2012) [5]O. Russakovsky and L. Fei-Fei, “Attribute learning in large-scale datasets,” in Trends and Topics in Computer Vision, pp. 1–14, Springer, 2012.
  • 24. 24 Scene Learning MIT-indoor Scene-15 CMAP-A 46.2% 82.7% CMAP-S 40.8% 80.7% CMAP-S+HM 41.7% 81.3% Li et al. [1] 47.6% 82.1% Pandey et al. [82] 43.1% - Kwitt et al. [3] 44% 82.3% Lazebnik et al. [4] - 81% Singh et al. [5] 38% 77% CMAP-A : Attribute based Scene Learning. CMAP-S : Scenes Learning from directly CMAP. CMAP-S+HM : Scene Learning from CMAP with hard mining. [1] Q. Li, J. Wu, and Z. Tu, “Harvesting mid-level visual concepts from large-scale internet images,” CVPR, 2013. [2] M. Pandey and S. Lazebnik, “Scene recognition and weakly supervised object localization with deformable part-based models,” ICCV, 2011. [3] R. Kwitt, N. Vasconcelos, and N. Rasiwasia, “Scene recognition on the semantic manifold,” ECCV, 2012. [4] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” in Computer Vision and Pattern Recognition, 2006” [5] S. Singh, A. Gupta, and A. A. Efros, “Unsupervised discovery of mid-level discriminative patches,” in European Conference Computer Vision (ECCV), 2012.
  • 25. 25 Object Learning CMAP [1] [2] CMAP [1] [2] airplane 0.63 0.51 0.76 car 0.97 0.98 0.94 face 0.67 0.52 0.82 guitar 0.89 0.81 0.60 leopard 0.76 0.74 0.89 motorbike 0.98 0.98 0.67 watch 0.55 0.48 0.53 overall 0.78 0.72 0.75 [1] R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman, “Learning object categories from google’s image search,” in Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, vol. 2, pp. 1816–1823, IEEE, 2005. [2] L.-J. Li and L. Fei-Fei, “Optimol: automatic online picture collection via incremental model learning,” International journal of computer vision, vol. 88, no. 2, pp. 147–168, 2010. → Dataset provided by [1]
  • 26. 26 Face Learning GBC+CF(half)[1] CMAP-1 CMAP-2 Baseline EASY 0.58 0.63 0.66 0.31 HARD 0.32 0.34 0.38 0.18 → Face learning results with detecting faces using OpenCV detector [1] M. Ozcan, J. Luo, V. Ferrari, and B. Caputo, “A large-scale database of images and captions for automatic face naming.,” in BMVC, pp. 1–11, 2011.
  • 27. 27 Selective Search From “Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer vision 104.2 (2013): 154-171.”
  • 28. 28 Selective Search with CMAP ● Remove outlier candidate regions from the detection tree of Selective Search[1] ● ~ 3.500 lower candidate region per image with better Recall and MABO*. [1] Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer vision 104.2 (2013): 154-171. [2] B. Alexe, T. Deselaers, and V. Ferrari, “Measuring the objectness of image windows,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 34, no. 11, pp. 2189–2202, 2012. MABO Recall No. of Windows Objectness [2] 0.69 0.94 1.853 Selective Search [1] 0.87 0.99 10.097 Selective Search + CMAP 0.89 0.99 6.753 * MABO : Mean Average Best Overlap.
  • 29. 29 Results Summary Ours State of art Face 0.66 0.58 [1] Scene 0.47 0.48 [5] Object 0.78 0.75 [2] Attribute Image-Net 0.37 0.36 [3] Attribute ebay 0.81 0.79 [4] Attribute bing 0.82 - [1] Ozcan, M., Luo, J., Ferrari, V., Caputo, B.: A large-scale database of images and captions for automatic face naming. BMVC. (2011) [2] Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: Computer Vision, 2005. ICCV 2005 [3] Russakovsky, O., Fei-Fei, L.: Attribute learning in large-scale datasets. In: Trends and Topics in Computer Vision. Springer (2012) [4] Van De Weijer, J., Schmid, C., Verbeek, J., Larlus, D.: Learning color names for real-world applications. Image Processing, IEEE (2009) [5] Li, Q., Wu, J., Tu, Z.: Harvesting mid-level visual concepts from large-scale internet images. CVPR (2013)
  • 30. 30 METHOD #2 AME – Association through Model Evolution
  • 31. 31 Method#2:AME (Association through Model Evolution) ● Iterative data cleansing ● Measure discriminativeness and representativeness. ● Define category versus random instances.
  • 32. 32 AME's motivation ● One another agnostic data refining method against Irrelevancy. ● Make use of far too much random instances as oppose to limited annotated instances. ● Evade Sub-Grouping using very high dimensional representations.
  • 33. 33 AME's method overview ● First discern category candidates (CC) from random set (RS). ● Define category references(CR). ● Second discern CR from CC. ● Define spurious instances (SI) against CR and eliminate. ● Re-Iterate Irrelevancy Solved!!
  • 34. 34 Step1 ● Discerning category from random set – Learn a linear model M1 between CC and RS. – Take the most confidently classified instances as the CR.
  • 35. 35 Step2 ● Discerning category references from others – Learn linear model M2 between CR and others. M1 M2
  • 36. 36 Step3 ● Define SI against CR. ● Eliminate SI. M1 x x x x x
  • 38. 38
  • 39. 39
  • 40. 40 High Dimensional Representation ● Our problem is discrimination. ● Therefore ; – High dimensions makes any category linearly separable from others despite of category modularity. sub-grouping solved !!
  • 41. 41 Feature Learning ● Learn frequent pattern on the data ● Learning Pipeline (similar to [1]); 1. Scrap random nxn patches from the images. Over Collected Patches; 2. Contrast normalization 3. ZCA Whitening 4. K-means for C words Over Whole Image; 5. Spatial (Max or Avg) Pooling by C words Learned Visual Words [1] Coates, Adam, Andrew Y. Ng, and Honglak Lee. "An analysis of single-layer networks in unsupervised feature learning." International Conference on Artificial Intelligence and Statistics. 2011. ={ 5 x C words } dimension for each img
  • 43. 43 Implementation Details ● AME – L1 Logistic Regression with Gauss-Seidel algorithm [1] – Final model L1 Linear SVM with Grafting[2]. – At each iteration 5 images are eliminated. ● Feature Learning – Use horizontally flipped images. – Re-size each gray-level image 60px height. – Contrast Normalization to random patches. – ZCA whitening with Ɛ=0.5. – Receptive field size 6x6 pixels – 1 px stride with 2400 words [1] Shirish Krishnaj Shevade and S Sathiya Keerthi. A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics,19(17):2246–2253, 2003. [2] Simon Perkins, Kevin Lacker, and James Theiler. Grafting: Fast, incremental feature selection by gradient descent in function space. The Journal of Machine Learning Research, 3:1333–1356, 2003.
  • 45. 45 DATASETS ● FAN-Large [2] – EASY subset: faces larger than 60x70 px, 138 categories. – ALL: no constraint, 365 categories. ● PubFig83[1] – Subset of PubFig with 83 celebrities at least 100 images for each. [1]N. Pinto, Z. Stone, T. Zickler, and D. Cox, “Scaling up biologically-inspired computer vision: A case study in unconstrained face recognition on facebook,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on, pp. 35–42, IEEE, 2011. [2] M. Ozcan, J. Luo, V. Ferrari, and B. Caputo, “A large-scale database of images and captions for automatic face naming.,” in BMVC, pp. 1–11, 2011.
  • 46. 46 Classification Pipeline ● No data refining ● Models are trained on the training set of the given dataset ● Results on PubFig83 [1]N. Pinto, Z. Stone, T. Zickler, and D. Cox, “Scaling up biologically-inspired computer vision: A case study in unconstrained face recognition on facebook,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on, pp. 35–42, IEEE, 2011 [2]B. C. Becker and E. G. Ortiz, “Evaluating open-universe face identification on the web,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on, pp. 904–911, IEEE, 2013. ● ~5% improvement on State of Art ● Better results with more words
  • 47. 47 AME results ● Baseline is the same classification pipeline without any data refining. ● All models are learned from web images. [18] S. Singh, A. Gupta, and A. A. Efros, “Unsupervised discovery of mid-level discriminative patches,” in European Conference Computer Vision (ECCV), 2012.
  • 48. 48 AME-- False vs True Elimination → Incremental plot of correct versus false outlier detections until AME finds all the outliers for all classes. Each iteration values are aggregated by the previous iteration.
  • 49. 49 AME-- X-val Accuracies → Cross-validation (final-model) and M1 accuracies as the algorithm proceeds. This shows a salient correlation between cross-validation classifier and M1 models, without M1 models incurring over-fitting.
  • 50. 50 AME-- # Elimination vs Accuracy → Effect of number of outliers removed at each iteration versus final test accuracy. It is observed that elimination after some limit imposes degradation of final performance and eliminating 1 instance per iteration is the salient selection without any sanity check.
  • 52. 52 Which one to Choose? ● Polysemy + Irrelevancy in the DATA CMAP→ ● Only Irrelevancy in the DATA AME→ ● Another choice: – Use AME first then CMAP – Not testified !
  • 53. 53 At the End ● We propose two novel algorithms CMAP and AME ● Compelling results against state-of-art methods for variety of Vision Tasks ● Learn complex visual concepts with a simple query.
  • 54. 54 That's the end ... Thanks for your valuable time :)