Learning a Structured Model for Visual Category Recognition
Abstract:
This thesis deals with the problem of estimating structure in data due to the semantic relations between data elements and leveraging this information to learn a visual model for category recognition. A visual model consists of dictionary learning, which computes a succinct representation of training data by partitioning feature space, and feature encoding, which learns a representation of each image as a combination of dictionary elements. Besides variations in lighting and pose, a key challenge of classifying a category is intra-category appearance variation. The key idea in this thesis is that feature data describing a category has latent structure due to visual content idiomatic to a category. However, popular algorithms in literature disregard this structure when computing a visual model.
Towards incorporating this structure in the learning algorithms, this thesis analyses two facets of feature data to discover relevant structure. The first is structure amongst the sub-spaces of the feature descriptor. Several sub-space embedding techniques that use global or local information to compute a projection function are analysed. A novel entropy based measure of structure in the embedded descriptors suggests that relevant structure has local extent. The second is structure amongst the partitions of feature space. Hard partitioning of feature space leads to ambiguity in feature encoding. To address this issue, novel fuzzy logic based dictionary learning and feature encoding algorithms are employed that are able to model the local feature vectors distributions and provide performance benefits.
To estimate structure amongst sub-spaces, co-clustering is used with a training descriptor data matrix to compute groups of sub-spaces. A dictionary learnt on feature vectors embedded in these multiple sub-manifolds is demonstrated to model data better than a dictionary learnt on feature vectors embedded in a single sub-manifold computed using principal components. In a similar manner, co-clustering is used with encoded feature data matrix to compute groups of dictionary elements - referred to as `topics'. A topic dictionary is demonstrated to perform better than a regular dictionary of comparable size. Both these results suggest that the groups of sub-spaces and dictionary elements have semantic relevance.
All the methods developed here have been viewed from the unifying perspective of matrix factorization, where a data matrix is decomposed to two factor matrices which are interpreted as a dictionary matrix and a co-efficient matrix. Sparse coding methods, which are currently enjoying much success, can be viewed as matrix factorization with a regularization constraint on the vectors of the dictionary or co-efficient matrices. ....
Learning a structured model for visual category recognition
1. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Learning A Structured Model For Visual Category
Recognition
Ashish Gupta
University of Surrey
a.gupta@surrey.ac.uk
July 5,2013
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
2. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Introduction
Introduction : What is Category Recognition?
Feature vector Embedding : Information in Sub-Manifold.
Feature vector distribution: Fuzzy Visual Model.
Estimating semantic structure: Co-clustering.
Sparse Models: Semantically structured.
Summary & Future Work
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
3. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Motivation
Visual Category?
Robot interacts physical objects.
Object taxonomy based on physical
properties.
Robot recognizes object using
visual appearance.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
4. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Motivation
Visual Category Model
Appearance variation → scatter of semantically related descriptors in feature
space
Can this scatter distribution be estimated?
Can this structure be used to improve the learnt visual model?
Visual category model ≈ Visual object model + Estimated structure of visual
category variation
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
5. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Approach
Visual Classification Pipeline
Structure in sub-spaces → groups of sub-spaces → dictionary
Structure in dictionary → groups of prototypes → encoding
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
6. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Approach
Feature Descriptor Matrix
Scene−15 D−SIFT, 500 feature vectors of 128 dimensions
feature vectors
dimensions
0
50
100
150
200
250
Matrix of 500 D-SIFT feature descriptors, each of 128 dimensions.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
7. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Approach
Encoded Feature Matrix
Conceptual illustration of encoded feature matrix, occurrence
histogram of visual words in images.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
8. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Approach
Conceptual Interpretation
Structure estimation can be interpreted as estimation of
semantically related rows or columns of data matrix. These are
projected to a lower dimensional space such that mutual separation
between equivalent feature vectors is reduced.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
9. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Sub-space Embedding
Feature descriptor space is high dimensional.
Relevant information is embedded in a lower dimensional
sub-manifold.
What is the appropriate lower dimensionality?
Measure efficacy of sub-space embedding method?
Measure information in embedded feature vectors.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
10. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Intrinsic Dimensionality
Intrinsic dimensionality p estimation
Correlation Dimension
Number of feature vectors in a hypersphere of radius r is proportional to rp
.
Maximum Likelihood Estimate
Expectation of number of feature vectors covered by a hypersphere of growing
radius r.
Eigenvalue Estimate
Number of eigenvalues greater than a small threshold value .
Geodesic Minimum Spanning Tree
Based on length of GMST of k descriptors in a neighbourhood graph.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
11. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Intrinsic Dimensionality
Estimated Intrinsic Dimensionality
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
12. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Intrinsic Dimensionality
Subspace Embedding Methods
Global Methods
Principal Components
Multi-Dimensional
Scaling
Stochastic Proximity
Embedding
Isomap
Diffusion Maps
Local Methods
Locally Linear Embedding
Locality Preserving Projection
Neighbourhood Preserving
Projection
Landmark Isomap
t-Stochastic Neighbourhood
Embedding
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
13. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Entropic Measure
Entropy Measure Intuition
−10 −5 0 5 10 15
0
20
40
−15
−10
−5
0
5
10
15
x
’swiss’ synthetic data
Y
Z
−1.5
−1
−0.5
0
0.5
1
1.5
−1
−0.5
0
0.5
1
−5
0
5
10
X
’intersect’ synthetic data
Y
Z
−400 −200 0 200 400
−500
0
500
−300
−200
−100
0
100
200
X
’VOC2006,car’ data
Y
Z
0 10 20 30 40 50 60 70 80 90 100
0
0.005
0.01
0.015
0.02
0.025
Bin index
NormalizedFrequency
Distribution of pair−wise distances in data
swiss, H=−25.3355
intersect, H=−19.3150
VOC2006,car, H=−33.0302
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
14. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
Comparison of Embedded Entropy
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
15. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
Computational Time Complexity
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
16. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
Classification Performance
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
17. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
Conclusion
Estimated intrinsic dimensionality was in the neighbourhood
of 14 of the 128-dimensional descriptor.
The performance of LPP in comparison to other embedding
methods accentuates the importance of modelling structure in
local distributions.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
18. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Fuzzy Visual Model
Structure in distribution of descriptors in feature space?
Issues with K-means clustering in the Bag-of-Words model.
Visual model incorporating Fuzzy logic framework.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
19. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Visual Ambiguity
Descriptor assignment has issues of uncertainty and
plausibility.
Kernel Codebook uses soft-assignment to resolve the
ambiguity.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
20. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Fuzzy Models
Visual Dictionary
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
times (normalized scale)
acceleration(normalizedscale)
K−means Hard Partition | Motorcycle Data
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
times (normalized scale)
acceleration(normalizedscale)
Fuzzy K−Means Partition | Motorcycle Data
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
times (normalized scale)
acceleration(normalizedscale)
Gustafson−Kessel Fuzzy Partition | Motorcycle Data
L(Z; µC) =
r
j=1 i∈Cj
zi − µCj
2
L(Z; D, A) =
r
i=1
n
j=1
(αij )m
zj − µCi
2
Σ
L(Z; D, A, {Σi }) =
r
i=1
n
j=1
(αij )m
zj − di
2
Σi
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
21. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Fuzzy Models
d2
Σ(z, µC) = (z−µC)T
Σ(z−µC)
Σ =
( 1
σ1
)2
0 · · · 0
0 ( 1
σ2
)2
· · · 0
...
...
...
...
0 0 · · · ( 1
σn
)p
d2
Σi
(zj , µCi ) = (zj −µCi )T
Σi (zj −µCi )
Fi =
n
j=1(αij )m
(zj − di )(zj − di )T
n
j=1(αij )m
Σi =
(ρi det(Fi ))
1
p
Fi
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
22. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
FKM Classification Performance
MITcoast
MITmountain
industrial
livingroom
MITopencountry
PARoffice
MITtallbuilding
CALsuburbstore
bedroom
MITforest
MIThighway
MITstreet
MITinsidecity
kitchen
visual category
0.5
0.6
0.7
0.8
Acc
Scene15
Bag-of-Words
Fuzzy K-means
sheep
horse
bicycle
motorbike cow bus dog cat
person car
visual category
0.45
0.50
0.55
0.60
Acc
VOC2006
Bag-of-Words
Fuzzy K-means
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
23. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
GK Classification Performance
MITcoast
MITmountain
industrial
livingroom
MITopencountry
PARoffice
MITtallbuilding
CALsuburbstore
bedroom
MITforest
MIThighway
MITstreet
MITinsidecity
kitchen
visual category
0.5
0.6
0.7
0.8
Acc
Scene15
Bag-of-Words
Gustafson-Kessel
sheep horse bicycle
motorbike cow bus dog cat person car
visual category
0.45
0.50
0.55
0.60
Acc
VOC2006
Bag-of-Words
Gustafson-Kessel
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
24. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
Dictionary Size
32 64 128 256 512
dictionary size
0.58
0.60
0.62
0.64
0.66
Acc
Caltech101
Bag-of-Words
Fuzzy K-means
32 64 128 256 512
dictionary size
0.58
0.60
0.62
0.64
0.66
Acc
Caltech101
Bag-of-Words
Gustafson-Kessel
Comparison of BoW with FKM and GK for different sizes of
dictionary.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
25. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
Aggregate Performance
VOC2006 VOC2010
data set
0.50
0.51
0.52
0.53
0.54
0.55
Acc
Bag-of-Words
Fuzzy K-means
Gustafson-Kessel
(a) VOC datasets
Caltech101 Caltech256
data set
0.60
0.62
0.64
0.66
0.68
Acc
Bag-of-Words
Fuzzy K-means
Gustafson-Kessel
(b) Caltech datasets
Visual Model Data Set
VOC-2006 VOC-2010 Caltech-101 Caltech-256
BoW 0.50825 0.52446 0.60111 0.67606
FKM 0.52635 0.53736 0.61928 0.68357
G-K 0.52885 0.54224 0.62413 0.68623
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
26. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
Conclusion
Visual model learnt within the framework of fuzzy logic adapts
to the local distribution of feature vectors.
Learning a better fuzzy membership function is an effective
alternative to learning increasing large dictionaries to adapt to
increasing complexity of visual categories.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
27. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Co-clustering for Structure Estimation
What is co-clustering?
Co-clustering for structure in descriptor data matrix.
Co-clustering for structure in encoded feature matrix.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
28. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Co-clustering Methods
Co-clustering
Co-clustering is simultaneous and alternative row and column
clustering of a data matrix.
At each step of the optimization routine, the groups of rows
guide column clustering and vice versa.
CX : {x1, . . . , xm} → {ˆx1, . . . , ˆxk}
CY : {y1, . . . , yn} → {ˆy1, . . . , ˆyl }
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
29. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Co-clustering Methods
Co-clustering methods
Information-Theoretic Co-Clustering
Data matrix is considered a joint probability distribution.
Minimizes KL-divergence between original data and co-clustered
matrices.
Sum-Squared Residue Co-Clustering
Alternative k-means clustering of rows and columns. Minimizes
squared Euclidean distance between rows and columns from row
and column means respectively.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
30. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Co-clustering Methods
Information-Theoretic Co-clustering
I(X; Y ) − I( ˆX; ˆY ) = dKL(p(X, Y ), q(X, Y ))
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
31. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Multiple Sub-spaces
Mutiple Sub-spaces Intuition
i,j
dE (z•
i|Sl
, z•
j|Sq
) >
i,j
dE (z•
i , z•
j ), l = q
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
32. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Multiple Sub-spaces
Co-clustering descriptor data matrix
Scene−15 D−SIFT, 500 feature vectors of 128 dimensions
feature vectors
dimensions
0
50
100
150
200
250
Information−Theoretic Co−Clustering of Scene−15 D−SIFT 500x128 into 10 row and 10 column clusters
feature vectors
dimensions
0
50
100
150
200
250
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
33. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Multiple Sub-spaces
Dictionary on single and multiple sub-spaces
Universal PCA Dictionary : VOC−2006 : D−SIFT : 10 x 500 : PCA + Kmeans
dictionary [500]
dimensions[10]PCA
0
100
200
Universal CC Dictionary : VOC−2006 : D−SIFT : 10 x 500 : SSRCC + Kmeans
dictionary [500]
dimensions[10]CC
0
100
200
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
34. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Multiple Sub-spaces
Classification performance
VOC2006 VOC2007
Data Set
0.50
0.55
0.60
0.65
0.70
F1
Dict: 10x1000
MSSD:(i): 5x1000
MSSD:(r): 5x1000
VOC2006 VOC2007
Data Set
0.50
0.55
0.60
0.65
F1
Dict: 10x1000
MSSD:(i): 10x1000
MSSD:(r): 10x1000
Comparison of classification performance of single and multiple sub-space
dictionaries.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
35. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Multiple Sub-spaces
Dictionary projected to multiple sub-spaces
Universal Dictionary : VOC−2006 : D−SIFT : 128x500 : Kmeans
dictionary [500]
dimensions[128]
0
50
100
150
200
250
Universal Submanifold Dictionary : VOC−2006 : D−SIFT : 128 (10) x 500 : SSRCC + Kmeans
dictionary [500]
dimensions[128],submanifolds[10]
0
50
100
150
200
250
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
36. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Multiple Sub-spaces
Classification performance
VOC2006 VOC2007
Data Set
0.50
0.55
0.60
0.65
F1(5)
Dict: 128x1000
SSSD:(i): 128x1000
SSSD:(r): 128x1000
VOC2006 VOC2007
Data Set
0.50
0.55
0.60
0.65
0.70
F1(50)
Dict: 128x1000
SSSD:(i): 128x1000
SSSD:(r): 128x1000
Comparison of classification performance of dictionary projected to multiple
sub-spaces.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
37. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Topic Dictionary
Structure in Dictionary Intuition
Estimating groups of non-contiguous partitions of feature space
that are semantically related.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
38. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Topic Dictionary
Topic Dictionary Concept
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
39. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Topic Dictionary
Classification Performance
Comparison of classification performance of dictionaries using BoW
and ITCC, for VOC2006 and Scene15 datasets.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
40. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Topic Dictionary
Dictionary sizes
VOC2006 VOC2007 VOC2010 Scene15 Caltech101
Data Set
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
F1
BoW: 100
CC:i: 100
VOC2006 VOC2007 VOC2010 Scene15 Caltech101
Data Set
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
F1
BoW: 500
CC:i: 500
VOC2006 VOC2007 VOC2010 Scene15 Caltech101
Data Set
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
F1
BoW: 1000
CC:i: 1000
Comparative classification performance for different dictionary
sizes.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
41. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Topic Dictionary
Conclusion
Groups of sub-spaces computed using co-clustering yielded
dictionaries with better classification performance.
Groups of feature space partition (dictionary elements) yielded
improved classification results.
These estimated groups can be used in learning a semantically
structured visual model.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
42. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Sparse Decomposition
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
43. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Sparse Visual Model
Sparse model approximates a feature vector as a combination
of a sub-set of an over-complete basis set.
Sparsity is induced by adding a regularization constraint is
added to the coefficients in the loss function.
Degree of sparsity is determined empirically.
Each basis element is considered individually.
Possible structure amongst basis elements is disregarded.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
44. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Structured Sparse Model
SSPCA (structure in sub-spaces)
Co-clustered groups of sub-spaces is used to augment Sparse-PCA
to compute Structured Sparse-PCA dictionary.
Group Lasso (structure in dictionary)
Co-clustered groups of dictionary elements is used to augment
Lasso to compute group Lasso feature encoding.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
45. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Sparse Regularization
Sparse regularization : min
α
1
n
n
i=1
L(zi , dαi ) + λΩ(α)
Lasso : min
α
1
n
n
i=1
zi − Dαi
2
+λ αi 1
Group Sparsity : min
α
1
n
n
i=1
zi − Dαi
2
+λ
k
j=1
αi Gj
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
46. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Structured Sub-space
Structured Sub-space Dictionary using ITCC
sheep
horse
bicycle
motorbike cow bus dog cat
person car
Visual Category
50
60
70
80
90
mAP
VOC2006
Sparse Subspace
Structured Subspace
sheep
horse
bicycle
aeroplanecow
sofabus dog cat
person
train
diningtable
bottlecar
pottedplant
tvmonitor
chairbird
boat
motorbike
Visual Category
50
60
70
80
90
mAP
VOC2007
Sparse Subspace
Structured Subspace
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
47. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Structured Sub-space
Structured Sub-space Dictionary using SSRCC
sheep
horse
bicycle
motorbike cow bus dog cat
person car
Visual Category
60
70
80
90
mAP
VOC2006
Sparse Subspace
Structured Subspace
sheep
horse
bicycle
aeroplanecow
sofabus dog cat
person
train
diningtable
bottlecar
pottedplant
tvmonitor
chairbird
boat
motorbike
Visual Category
50
60
70
80
90
mAP
VOC2007
Sparse Subspace
Structured Subspace
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
48. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Structured Sub-space
Sparse Subspace Structured Sparse Subspace
Data Set ITCC SSRCC
VOC2006 67.5941 70.8295 68.5808
VOC2007 67.9971 68.0783 68.3718
Sparse selection of semantically related set of sub-spaces
performs better than sparse individual selection of sub-spaces.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
49. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Structured Sparse Dictionary
Structured Sparse Encoding using ITCC
MITcoast
MITmountain
industrial
livingroom
MITopencountry
PARoffice
MITtallbuilding
CALsuburb
store
bedroom
MITforest
MIThighway
MITstreet
MITinsidecity
kitchen
Visual Category
50
60
70
80
90
mAP
Scene15 ITCC
Sparse Encoding
Structured Encoding
sheep
horse
bicycle
motorbike cow bus dog cat
person car
Visual Category
60
70
80
90
100
mAP
VOC2006 ITCC
Sparse Encoding
Structured Encoding
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
50. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Structured Sparse Dictionary
Structured Sparse Encoding using SSRCC
MITcoast
MITmountain
industrial
livingroom
MITopencountry
PARoffice
MITtallbuilding
CALsuburb
store
bedroom
MITforest
MIThighway
MITstreet
MITinsidecity
kitchen
Visual Category
50
55
60
65
70
75
80
85
mAP
Scene15 SSRCC
Sparse Encoding
Structured Encoding
sheep
horse
bicycle
motorbike cow bus dog cat
person car
Visual Category
60
70
80
90
100
mAP
VOC2006 SSRCC
Sparse Encoding
Structured Encoding
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
51. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Structured Sparse Dictionary
Sparse Encoding Structured Sparse Encoding
Data Set ITCC SSRCC
VOC-2006 72.8386 73.3977 72.7738
Scene-15 68.5737 79.8794 72.1155
Sparse selection of semantically related set of dictionary
elements performs better than sparse individual selection of
dictionary element.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
52. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Summary
Learning semantically relevant structure in feature space used
to compute better visual models.
Analysis of sub-space embedding emphasized modelling local
distributions.
Incorporation of fuzzy logic framework to learn dictionary
kernels that adapt to local distributions yielded better visual
models.
Co-clustering was successful in grouping semantically related
sub-spaces and feature space partitions.
Estimated groups of sub-spaces and dictionary elements were
used to compute structured sparse visual models, improving
upon regular sparse models.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
53. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Future Work
Future Work
Visual models using Fisher Kernel coding, which uses a
Gaussian kernel, has been very successful. Combining the
approach in Fisher Kernels with the learnt Fuzzy membership
functions could potentially improve the visual model.
Fuzzy logic based learning algorithms that are more advanced
than Gustafson-Kessel could be explored to learn better
membership functions.
Co-clustering creates a block factorization of the data matrix.
Partial membership of rows and columns to the co-clusters
would be the natural next step.
Explore ways of using semantic structure to improve feature
generation techniques like hierarchical models that aim to
learn category specific descriptors.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
54. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Future Work
End
Questions...
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
55. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Appendices
BoW Partitioning
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
y
Bag−of−Words Partition | VOC−2006 | #000017
Figure: Bag-of-Words model and image ‘000017’ in VOC-2006 dataset. The dictionary of size 25 ( ) is
computed using K-means clustering. The feature vectors ( ) are projected to 2 dimensions using PCA.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
56. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Appendices
FKM Partitioning
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
y
Fuzzy K−means Fuzzy Partition | VOC−2006 | #000017
Figure: Fuzzy K-means model and image ‘000017’ in VOC-2006 dataset.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
57. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Appendices
GK Partitioning
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
y
Gustafson−Kessel Fuzzy Partition | VOC−2006 | #000017
Figure: Gustafson-Kessel model and image ‘000017’ in VOC-2006 dataset.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition