My dissertation work has involved in proposing a higher-level visual representation that enhances the traditional part-based bag of visual words (BOW) representation in many aspects. Firstly, we introduce a new multilayer semantic significance analysis (MSSA) model to select semantically significant visual words (SSVWs) from the classical visual words in order to overcome the noisiness of the feature quantization process (in collaboration with Prof. Zhongfei (Mark) Zhang, State University of New York at Binghamton). Secondly, we represent the spatial and the color constitution of an image as a mixture of n Gaussians in the feature space in order to propose a new spatial weighting scheme consists in weighting SSVWs according to the probability of each SSVW belongs to each of the n Gaussians. Thirdly, we strengthen the discrimination power of SSVWs by constructing semantically significant visual phrases (SSVPs) from frequently co-occur SSVWs in the same local context and semantically coherent. Finally, we introduce a new multiclass vote-based classifier (MVBC) based on this new higher-level visual representation. The large-scale extensive experimental results show that the proposed higher-level visual representation, Semantically Significant Invariant Visual Glossary (SSVIG), outperforms the traditional part-based image representation in retrieval, classification, object recognition, visual ranking, and visual summarization.
Designing Guidelines for Visual Analytics System to Augment Organizational An...
A Higher-Level Visual Representation For Semantic Learning In ImageDatabases
1. A Higher-Level Visual Representation For Semantic Learning In ImageDatabasesIsmail EL SAYAD18/07/2011
2. Introduction Related works Our approach Experiments Conclusion and perspectives Overview Introduction Related works Our approach Enhanced Bag of Visual Words (E-BOW) Multilayer Semantically Significant Analysis Model (MSSA) Semantically Significant Invariant Visual Glossary (SSIVG) Experiments Image retrieval Image classification Object Recognition Conclusion and perspectives 2
3. Introduction Related works Our approach Experiments Conclusion and perspectives Motivation Digital content grows rapidly Personal acquisition devices Broadcast TV Surveillance Relatively easy to store, but useless if no automatic processing, classification, and retrieving The usual way to solve this problem is by describing images by keywords. This method suffers from subjectivity, text ambiguity and the lack of automatic annotation 3
4. Introduction Related works Our approach Experiments Conclusion and perspectives Visual representations Image-based representations are based on global visual features extracted over the whole image like color, color moment, shape or texture Visual representations Image-based representations Part-based representations 4
5. Introduction Related works Our approach Experiments Conclusion and perspectives Visual representations The main drawbacks of Image-based representations: High sensitivity to : Scale Pose Lighting condition changes Occlusions Cannot capture the local information of an image Part-based representations: Based on the statistics of features extracted from segmented image regions 5
6. Introduction Related works Our approach Experiments Conclusion and perspectives Visual representationsPart-based representations (Bag of visual words) Feature space Visual word vocabulary VW1 VW2 VW3 VW4 . . . Compute local descriptors Feature clustering VW1 VW2 VW3 VW1 VW2 VW3 VW4 . . . 2 1 1 1 . . . VW4 Frequency VW1 6
9. Ignore the positionUsing only keypoints-based Intensity descriptors: Neither shape nor color information is used Feature quantization noisiness: Unnecessary and insignificant visual words are generated 7
10.
11. Introduction Related works Our approach Experiments Conclusion and perspectives Objectives Enhanced BOW representation Different local information (intensity, color, shape…) Spatial constitution of the image Efficient visual word vocabulary structure Higher-level visual representation Less noisy More discriminative More invariant to the visual diversity 9
12. MSSA model Introduction Related works Our approach Experiments Conclusion and perspectives E-BOW Overview of the proposed higher-level visual representation SSVIWs & SSIVPs generation E-BOW representation SSIVG representation SSIVG Learning the MSSA model Visual word vocabulary building Set of images 10
13. Introduction Related works Our approach Experiments Conclusion and perspectives Introduction Related works Spatial Pyramid Matching Kernel (SPM) & sparse coding Visual phrase & descriptive visual phrase Visual phrase pattern & visual synset Our approach Experiments Conclusion and perspectives 11
14. Introduction Related works Our approach Experiments Conclusion and perspectives Spatial Pyramid Matching Kernel (SPM) & sparse coding Lazebnik et al. [CVPR06] Spatial Pyramid Matching Kernel (SPM): exploiting the spatial information of location regions. Yang et al. [CVPR09] SPM + sparse coding: replacing k-means in the SPM 12
15. Introduction Related works Our approach Experiments Conclusion and perspectives Visual phrase & descriptive visual phrase Zheng and Gao [TOMCCAP08] Visual phrase: pair of spatially adjacent local image patches Zhang et al. [ACM MM09] Descriptive visual phrase: selected according to the frequencies of its constituent visual word pairs 13
16. Introduction Related works Our approach Experiments Conclusion and perspectives Visual phrase pattern & visual sysnet Yuan et al. [CVPR07] Visual phrase pattern: spatially co-occurring group of visual words Zheng et al. [CVPR08] Visual synset: relevance-consistent group of visual words or phrases in the spirit of the text synset 14
17. Introduction Related works Our approach Experiments Conclusion and perspectives Comparison of the different enhancements of the BOW 15
18. Introduction Related works Our approach Experiments Conclusion and perspectives Introduction Related works Our approach Enhanced Bag of Visual Words (E-BOW) Multilayer Semantically Significant Analysis Model (MSSA) Semantically Significant Invariant Visual Glossary (SSIVG) Experiments Conclusion and perspectives 16
19. Introduction Related works Our approach Experiments Conclusion and perspectives Enhanced Bag of Visual Words (E-BOW) Set of images E-BOW MSSA model SSIVG SURF & Edge Context extraction Features fusion Hierarchal features quantization E-BOW representation 17
20. Introduction Related works Our approach Experiments Conclusion and perspectives Enhanced Bag of Visual Words (E-BOW)Feature extraction Interest points detection Edge points detection Colorfiltering using vector median filter (VMF ) SURF feature vector extraction at each interest point Colorfeature extraction at each interest and edge point Fusion of the SURF and edgecontextfeaturevectors Color and position vector clustering using Gaussian mixture model Edge Context feature vector extraction at each interest point Collection of all vectors for the whole image set ∑3 µ3Pi3 ∑2 µ2Pi2 ∑1 µ1Pi1 HAC and Divisive Hierarchical K-Means clustering VW vocabulary 18
21. Introduction Related works Our approach Experiments Conclusion and perspectives Enhanced Bag of Visual Words (E-BOW)Feature extraction (SURF) SURF is a low-level feature descriptor Describes how the pixel intensities are distributed within a scale dependent neighborhood of each interest point. Good at Handling serious blurring Handling image rotation Poor at Handling illumination change Efficient 19
22. Introduction Related works Our approach Experiments Conclusion and perspectives Enhanced Bag of Visual Words (E-BOW)Feature extraction (Edge Context descriptor) Edge context descriptor is represented at each interest point as a histogram : 6 bins for the magnitude of the drawn vectors to the edge points 4 bins for the orientation angle 20
23. Introduction Related works Our approach Experiments Conclusion and perspectives Enhanced Bag of Visual Words (E-BOW)Feature extraction (Edge context descriptor) This descriptor is invariant to : Translation : The distribution of the edge points is measured with respect to fixed points Scale: The radial distance is normalized by a mean distance between the whole set of points within the same Gaussian Rotation: All angles are measured relative to the tangent angle of each interest point 21
24.
25. Introduction Related works Our approach Experiments Conclusion and perspectives Multilayer Semantically Significant Analysis (MSSA) model Set of images E-BOW MSSA model SSIVG VWs semantic inference estimation SURF & Edge Context extraction Number of latent topics Estimation Features fusion Parameters estimation Hierarchal features quantization Generative process E-BOW representation 23
26. Introduction Related works Our approach Experiments Conclusion and perspectives Multilayer Semantically Significant Analysis (MSSA) model Generative Process Different Visualaspects A topic model that considers this hierarchal structure is needed Higher-level aspect: People 24
37. Introduction Related works Our approach Experiments Conclusion and perspectives Semantically Significant Invariant Visual Glossary (SSIVG) representation Set of images E-BOW MSSA model SSIVG VWs semantic inference estimation SURF & Edge Context extraction SSVP representation SSIVG representation Number of latent topics Estimation Features fusion SSVPs generation SSIVP representation Parameters estimation Hierarchal features quantization SSVW representation Divisive theoretic clustering SSIVW representation Generative process E-BOW representation SSVWs selection 29
38. Introduction Related works Our approach Experiments Conclusion and perspectives Semantically Significant Invariant Visual Glossary (SSIVG) representationSemantically Significant Visual Word (SSVW) Set of relevant visual topics Set of VWs Estimating using MSSA Set of SSVWs Estimating using MSSA 30
44. Introduction Related works Our approach Experiments Conclusion and perspectives Semantically Significant Invariant Visual Glossary (SSIVG) representationSemantically Significant Visual Phrase (SSVP) SSIVP126 SSIVP126 SSIVP326 SSIVP326 SSIVP304 SSIVP304 32
45. Introduction Related works Our approach Experiments Conclusion and perspectives Semantically Significant Invariant Visual Glossary (SSIVG) representationInvariance Problem Studying the co-occurrence and spatial scatter information make the image representation more discriminative The invariance power of SSVWs and SSVPs is still low Text documents Synonymous words can be clustered into one synonymy set to improve the document categorization performance 33
46. Introduction Related works Our approach Experiments Conclusion and perspectives Semantically Significant Invariant Visual Glossary (SSIVG) representation SSIVG : higher-level visual representation composed from two different layers of representation Semantically Significant Invariant Visual Word (SSIVW) Re-indexed SSVWs after a distributional clustering Semantically Significant Invariant Visual Phrases (SSIVP) Re-indexed SSVPs after a distributional clustering Set of relevant visual topics Set of SSVWs and SSVPs Estimating using MSSA Estimating using MSSA Divisive theoretic clustering Set of SSIVGs Set of SSIVPs Set of SSIVWs 34
47. Introduction Related works Our approach Experiments Conclusion and perspectives Experiments Introduction Related works Our approach Experiments Image retrieval Image classification Object Recognition Conclusion and perspectives 35
53. Introduction Related works Our approach Experiments Conclusion and perspectives Assessment of the SSIVG representation Performance in image retrieval 37
54. Introduction Related works Our approach Experiments Conclusion and perspectives Assessment of the SSIVG representation performance in image retrieval 38 38
57. SVM with linear kernelMulticlass Vote-Based Classifier (MVBC) 39
58. Introduction Related works Our approach Experiments Conclusion and perspectives Evaluation of the SSIVG representation in image classification Multiclass Vote-Based Classifier (MVBC) For each , we detect the high latent topic that maximizes: is The final voting score for a high latent topic : is Each image is categorized according to the dominant high latent 40
59. Introduction Related works Our approach Experiments Conclusion and perspectives Evaluation of the SSIVG representation performance in classification 41
60.
61. Introduction Related works Our approach Experiments Conclusion and perspectives Assessment of the SSIVG Representation Performance in Object Recognition 43
62. Introduction Related works Our approach Experiments Conclusion and perspectives Experiments Introduction Related works Our approach Experiments Conclusion and perspectives 44
Talk briefly about the introductionTalk about the different parts of the approach briefly
Nowadays, images can be described using their visual content.
Talk about analogy between text and imagesDifferent visual appearanceSAME VISUAL WORD APPEAR IN TWO DIFFERENT IMAGE DESCRIBING DIFFERENT SEMANTICS
This work aims at addressing these drawbacks with the following
All the related works are based on BOW representation, they propose different Higher-level representation
Spatial pyramid extended the BOW representationExample of spatial pyramid for three different spatial level and resolutions
enhanced this approach by selecting descriptive visual phrases from the constructed visual phrases according to the frequencies of their constituent visual word pairs.
EFFICIENT structure by other approaches are for visual word vocabulary but we used for word and phrase
Motivation of the edge context
Add training images
This generative process leads to the following conditional probability distribution:Following the maximum likelihood principle, one can estimate the parameters bymaximizing the log-likelihood function as follows:
The number of the high latent topics, L, and the number of the visual latent topics, K, is determined in advance for the model fitting based on the Minimum Description Length (MDL)principle
Check the size of the cylinders
See right part
Correct the animationMake boxes bigger
Global approach slide before this slide
Add parameter settings
Discriptive correct and upper casesAdd references
Add this slide at the end of the presentations and add sub points to slide before
Upper case corrections
Global approach slide before this slide
Add parameter settings
Parameters update: It will be essential to design on-line algorithms to continuously (re-)learn the parameters of the proposed MSSA model, as the content of digital databases is modified by the regular upload or deletion of images.Invariance issue: It will be interesting to investigate more on the invariance issue especially in the context large-scale databases where large intra-class variations can occur.Cross-modalitily extension: The proposed higher-level visual representation can be extended to video content. The extension can be based on cross-modal data (visual and textual closed captions contents). Video summarization: A new generic framework of video summarization based on the extended higher-level semantic representation of video content can be designed.Talk that this work is applied at the frame level
Parameters update: It will be essential to design on-line algorithms to continuously (re-)learn the parameters of the proposed MSSA model, as the content of digital databases is modified by the regular upload or deletion of images.Invariance issue: It will be interesting to investigate more on the invariance issue especially in the context large-scale databases where large intra-class variations can occur.Cross-modalitily extension: The proposed higher-level visual representation can be extended to video content. The extension can be based on cross-modal data (visual and textual closed captions contents). Video summarization: A new generic framework of video summarization based on the extended higher-level semantic representation of video content can be designed.Talk that this work is applied at the frame level