SlideShare a Scribd company logo
1 of 48
A Higher-Level Visual Representation For Semantic  Learning In ImageDatabasesIsmail EL SAYAD18/07/2011
Introduction Related works Our approach Experiments Conclusion and perspectives Overview Introduction Related works Our approach Enhanced Bag of Visual Words (E-BOW) Multilayer Semantically Significant Analysis Model (MSSA) Semantically Significant Invariant Visual Glossary (SSIVG)  Experiments Image retrieval Image classification Object Recognition Conclusion and perspectives 2
Introduction Related works Our approach Experiments Conclusion and perspectives Motivation Digital content grows rapidly                   Personal acquisition devices Broadcast TV  Surveillance Relatively easy to store, but useless if no automatic processing, classification, and retrieving The usual way to solve this problem is by describing images by keywords. This method suffers from subjectivity, text  ambiguity and the lack of automatic annotation 3
Introduction Related works Our approach Experiments Conclusion and perspectives Visual representations Image-based representations  are based on global visual features extracted  over the whole image like color, color moment, shape or texture  Visual representations Image-based  representations Part-based representations 4
Introduction Related works Our approach Experiments Conclusion and perspectives Visual representations The main drawbacks of  Image-based representations: High sensitivity to : Scale Pose   Lighting condition changes  Occlusions Cannot capture the local information of an image Part-based representations:  Based  on the statistics of features extracted from segmented image regions 5
Introduction Related works Our approach Experiments Conclusion and perspectives Visual representationsPart-based representations (Bag of visual words) Feature space Visual  word vocabulary  VW1 VW2 VW3 VW4 . . . Compute  local descriptors Feature clustering VW1 VW2 VW3 VW1 VW2 VW3 VW4 . . . 2 1 1 1 . . . VW4 Frequency VW1 6
Introduction Related works Our approach Experiments Conclusion and perspectives Visual representations Bag of visual words (BOW) drawbacks ,[object Object]
Record number of occurrences
Ignore the positionUsing only keypoints-based Intensity descriptors: Neither shape nor color information  is used Feature quantization noisiness:  Unnecessary and insignificant visual words are generated 7
Introduction Related works Our approach Experiments Conclusion and perspectives Visual representationsDrawbacks Bag of Visual words (BOW) ,[object Object],Different image semantics are represented by the same visual words Low invariance for visual diversity:  One image semantic is represented by different visual words VW1364 VW1364 VW330 VW480 VW263 VW148 8
Introduction Related works Our approach Experiments Conclusion and perspectives Objectives  Enhanced BOW representation Different local information (intensity, color, shape…) Spatial constitution of the image Efficient visual word vocabulary structure Higher-level visual representation Less noisy  More discriminative More  invariant to the visual diversity 9
MSSA model Introduction Related works Our approach Experiments Conclusion and perspectives E-BOW Overview of the proposed higher-level visual representation SSVIWs & SSIVPs generation E-BOW representation SSIVG representation SSIVG Learning the MSSA model  Visual word vocabulary building Set of images 10
Introduction Related works Our approach Experiments Conclusion and perspectives Introduction Related works Spatial Pyramid Matching Kernel (SPM)  & sparse coding  Visual phrase & descriptive visual phrase Visual phrase pattern & visual synset Our approach Experiments Conclusion and perspectives 11
Introduction Related works Our approach Experiments Conclusion and perspectives Spatial Pyramid Matching Kernel (SPM) & sparse coding  Lazebnik et al. [CVPR06] Spatial Pyramid Matching Kernel (SPM): exploiting the spatial information of location regions. Yang et al. [CVPR09] SPM + sparse coding: replacing  k-means in the SPM 12
Introduction Related works Our approach Experiments Conclusion and perspectives Visual phrase & descriptive visual phrase Zheng and Gao [TOMCCAP08] Visual phrase:  pair of spatially adjacent local image patches Zhang et al. [ACM MM09]  Descriptive visual phrase: selected according to the frequencies of its constituent visual word pairs 13
Introduction Related works Our approach Experiments Conclusion and perspectives Visual phrase pattern & visual sysnet Yuan et al. [CVPR07]  Visual phrase pattern: spatially co-occurring group of visual words Zheng et al.  [CVPR08]  Visual synset:  relevance-consistent group of visual words or phrases in the spirit of the text synset 14
Introduction Related works Our approach Experiments Conclusion and perspectives Comparison of the different enhancements of  the BOW  15
Introduction Related works Our approach Experiments Conclusion and perspectives Introduction Related works Our approach  Enhanced Bag of Visual Words (E-BOW) Multilayer Semantically Significant Analysis Model (MSSA) Semantically Significant Invariant Visual Glossary (SSIVG)  Experiments Conclusion and perspectives 16
Introduction Related works Our approach Experiments Conclusion and perspectives  Enhanced Bag of Visual Words (E-BOW) Set of images E-BOW MSSA model SSIVG SURF & Edge Context extraction Features fusion  Hierarchal features quantization E-BOW representation 17
Introduction Related works Our approach Experiments Conclusion and perspectives  Enhanced Bag of Visual Words (E-BOW)Feature extraction  Interest points detection Edge points detection Colorfiltering  using vector median filter (VMF )  SURF feature vector extraction at each interest point  Colorfeature extraction at each interest and edge point  Fusion of the SURF and edgecontextfeaturevectors Color and position vector  clustering using Gaussian mixture model  Edge Context feature vector extraction at each  interest point  Collection of all vectors for the whole image set ∑3 µ3Pi3 ∑2 µ2Pi2 ∑1 µ1Pi1 HAC and Divisive Hierarchical K-Means clustering VW vocabulary 18
Introduction Related works Our approach Experiments Conclusion and perspectives  Enhanced Bag of Visual Words (E-BOW)Feature extraction (SURF) SURF is a low-level feature descriptor Describes how the pixel intensities are distributed within a scale dependent neighborhood of each interest point.  Good at Handling serious blurring  Handling image rotation  Poor at Handling illumination change  Efficient 19
Introduction Related works Our approach Experiments Conclusion and perspectives Enhanced Bag of Visual Words (E-BOW)Feature extraction (Edge Context descriptor)  Edge context descriptor is represented at  each interest point as a histogram :  6 bins for the magnitude of the drawn vectors to the edge points    4 bins for the orientation angle 20
Introduction Related works Our approach Experiments Conclusion and perspectives  Enhanced Bag of Visual Words (E-BOW)Feature extraction (Edge context descriptor)  This descriptor is invariant to : Translation : The distribution of the edge points is measured with respect to fixed  points Scale: The radial distance is normalized by a mean distance between the whole set of points within the same Gaussian Rotation: All angles are measured relative to the tangent angle of each interest point 21
Introduction Related works Our approach Experiments Conclusion and perspectives Enhanced Bag of Visual Words (E-BOW)Hierarchalfeature quantization ,[object Object], Hierarchical Agglomerative Clustering (HAC)  Divisive Hierarchical K-Means Clustering Stop clustering at desired level k  k clusters from HAC … The tree is determined level by level, down to some maximum number of levels L, and each division into k parts.  Merged feature in  the feature space A cluster at k =4 22
Introduction Related works Our approach Experiments Conclusion and perspectives Multilayer Semantically Significant Analysis (MSSA) model  Set of images E-BOW MSSA model SSIVG VWs semantic  inference estimation SURF & Edge Context extraction Number of latent topics Estimation Features fusion  Parameters estimation Hierarchal features quantization Generative process E-BOW representation 23
Introduction Related works Our approach Experiments Conclusion and perspectives Multilayer Semantically Significant Analysis (MSSA) model Generative Process Different Visualaspects A topic model that considers this hierarchal structure is needed  Higher-level aspect: People 24
Introduction Related works Our approach Experiments Conclusion and perspectives  Multilayer Semantically Significant Analysis (MSSA) model Generative Process φ Θ Ψ In the MSSA,  there are two  different latent (hidden) topics: ,[object Object]
Visual latent topic        that represents the visual aspectsV W   v h im M N 25
Introduction Related works Our approach Experiments Conclusion and perspectives Multilayer Semantically Significant Analysis (MSSA) model Parameter Estimation Probability distribution function : ,[object Object],Gaussier et al. [ ACM SIGIR05]: maximizing the likelihood can be seen as a Nonnegative Matrix Factorization (NMF) problem under the generalized KL divergence Objective function: 26
Introduction Related works Our approach Experiments Conclusion and perspectives Multilayer Semantically Significant Analysis (MSSA) model Parameter Estimation ,[object Object]
This leads to the following  multiplicative update rules : 27
Introduction Related works Our approach Experiments Conclusion and perspectives Multilayer Semantically Significant Analysis (MSSA) modelNumber of Latent Topics Estimation ,[object Object]
Number of the high latent topics (L)
Number of the visual latent topics (K)
is the log-likelihood
     is the number of free parameters:28
Introduction Related works Our approach Experiments Conclusion and perspectives Semantically Significant Invariant Visual Glossary (SSIVG) representation Set of images E-BOW MSSA model SSIVG VWs semantic  inference estimation SURF & Edge Context extraction SSVP representation SSIVG representation Number of latent topics Estimation Features fusion  SSVPs generation SSIVP representation Parameters estimation Hierarchal features quantization SSVW representation Divisive theoretic  clustering SSIVW representation Generative process E-BOW representation SSVWs selection 29
Introduction Related works Our approach Experiments Conclusion and perspectives Semantically Significant Invariant Visual Glossary (SSIVG) representationSemantically Significant Visual Word (SSVW) Set of relevant visual topics  Set of VWs Estimating  using MSSA Set of SSVWs Estimating  using MSSA 30
Introduction Related works Our approach Experiments Conclusion and perspectives Semantically Significant Invariant Visual Glossary (SSIVG) representationSemantically significant Visual Phrase (SSVP) SSVP: Higher-level and more discriminative representation SSVWs + their inter-relationships  SSVPs are formed from SSVW sets that satisfy all the following conditions: ,[object Object]
Involved in strong association rules
High support and confidence
Have  the same semantic meaning
High probability related to at least one common visual latent topic31
Introduction Related works Our approach Experiments Conclusion and perspectives Semantically Significant Invariant Visual Glossary (SSIVG) representationSemantically Significant Visual Phrase (SSVP) SSIVP126 SSIVP126 SSIVP326 SSIVP326 SSIVP304 SSIVP304 32
Introduction Related works Our approach Experiments Conclusion and perspectives Semantically Significant Invariant Visual Glossary (SSIVG) representationInvariance Problem  Studying the co-occurrence and spatial scatter information make the image representation more discriminative  The invariance power of SSVWs and SSVPs is still low  Text documents Synonymous words can be clustered into one synonymy set to improve the document categorization performance 33
Introduction Related works Our approach Experiments Conclusion and perspectives Semantically Significant Invariant Visual Glossary (SSIVG) representation SSIVG : higher-level visual representation composed from two different layers of representation Semantically Significant Invariant Visual Word (SSIVW)  Re-indexed SSVWs  after a distributional clustering Semantically Significant Invariant Visual Phrases  (SSIVP)  Re-indexed SSVPs after a distributional clustering Set of relevant visual topics      Set of SSVWs and SSVPs Estimating                using MSSA Estimating                           using MSSA Divisive theoretic  clustering Set of SSIVGs Set of SSIVPs Set of SSIVWs 34
Introduction Related works Our approach Experiments Conclusion and perspectives Experiments Introduction Related works Our approach Experiments Image retrieval Image classification Object Recognition Conclusion and perspectives 35
Introduction Related works Our approach Experiments Conclusion and perspectives Assessment of the SSIVG representation performance in image retrieval Evaluation criteria : ,[object Object]

More Related Content

Similar to A Higher-Level Visual Representation For Semantic Learning In Image Databases

ECCV2010: feature learning for image classification, part 1
ECCV2010: feature learning for image classification, part 1ECCV2010: feature learning for image classification, part 1
ECCV2010: feature learning for image classification, part 1
zukun
 
얼굴 검출 기법과 감성 언어 인식기법
얼굴 검출 기법과 감성 언어 인식기법얼굴 검출 기법과 감성 언어 인식기법
얼굴 검출 기법과 감성 언어 인식기법
cyberemotion
 
얼굴검출기법 감성언어인식기법
얼굴검출기법 감성언어인식기법얼굴검출기법 감성언어인식기법
얼굴검출기법 감성언어인식기법
cyberemotions
 

Similar to A Higher-Level Visual Representation For Semantic Learning In Image Databases (20)

Matlab abstract 2016
Matlab abstract 2016Matlab abstract 2016
Matlab abstract 2016
 
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
 
Icpc13.ppt
Icpc13.pptIcpc13.ppt
Icpc13.ppt
 
Web Image Retrieval Using Visual Dictionary
Web Image Retrieval Using Visual DictionaryWeb Image Retrieval Using Visual Dictionary
Web Image Retrieval Using Visual Dictionary
 
Web Image Retrieval Using Visual Dictionary
Web Image Retrieval Using Visual DictionaryWeb Image Retrieval Using Visual Dictionary
Web Image Retrieval Using Visual Dictionary
 
Ala Stolpnik's Standard Model talk
Ala Stolpnik's Standard Model talkAla Stolpnik's Standard Model talk
Ala Stolpnik's Standard Model talk
 
Visual7W Grounded Question Answering in Images
Visual7W  Grounded Question Answering in ImagesVisual7W  Grounded Question Answering in Images
Visual7W Grounded Question Answering in Images
 
Bridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionBridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full version
 
F010433136
F010433136F010433136
F010433136
 
Creating Community at WeWork through Graph Embeddings with node2vec - Karry Lu
Creating Community at WeWork through Graph Embeddings with node2vec - Karry LuCreating Community at WeWork through Graph Embeddings with node2vec - Karry Lu
Creating Community at WeWork through Graph Embeddings with node2vec - Karry Lu
 
ECCV2010: feature learning for image classification, part 1
ECCV2010: feature learning for image classification, part 1ECCV2010: feature learning for image classification, part 1
ECCV2010: feature learning for image classification, part 1
 
using a semantic approach for a cataloguing service
using a semantic approach for a cataloguing serviceusing a semantic approach for a cataloguing service
using a semantic approach for a cataloguing service
 
Workshop on Quantitative Analytics Using Interactive On-line Tool
Workshop on Quantitative Analytics Using Interactive On-line ToolWorkshop on Quantitative Analytics Using Interactive On-line Tool
Workshop on Quantitative Analytics Using Interactive On-line Tool
 
Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...
Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...
Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...
 
BAG OF VISUAL WORDS FOR WORD SPOTTING IN HANDWRITTEN DOCUMENTS BASED ON CURVA...
BAG OF VISUAL WORDS FOR WORD SPOTTING IN HANDWRITTEN DOCUMENTS BASED ON CURVA...BAG OF VISUAL WORDS FOR WORD SPOTTING IN HANDWRITTEN DOCUMENTS BASED ON CURVA...
BAG OF VISUAL WORDS FOR WORD SPOTTING IN HANDWRITTEN DOCUMENTS BASED ON CURVA...
 
Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...
Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...
Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...
 
Perspectives on Software Visualization
Perspectives on Software VisualizationPerspectives on Software Visualization
Perspectives on Software Visualization
 
얼굴 검출 기법과 감성 언어 인식기법
얼굴 검출 기법과 감성 언어 인식기법얼굴 검출 기법과 감성 언어 인식기법
얼굴 검출 기법과 감성 언어 인식기법
 
얼굴검출기법 감성언어인식기법
얼굴검출기법 감성언어인식기법얼굴검출기법 감성언어인식기법
얼굴검출기법 감성언어인식기법
 
Designing Guidelines for Visual Analytics System to Augment Organizational An...
Designing Guidelines for Visual Analytics System to Augment Organizational An...Designing Guidelines for Visual Analytics System to Augment Organizational An...
Designing Guidelines for Visual Analytics System to Augment Organizational An...
 

A Higher-Level Visual Representation For Semantic Learning In Image Databases

  • 1. A Higher-Level Visual Representation For Semantic Learning In ImageDatabasesIsmail EL SAYAD18/07/2011
  • 2. Introduction Related works Our approach Experiments Conclusion and perspectives Overview Introduction Related works Our approach Enhanced Bag of Visual Words (E-BOW) Multilayer Semantically Significant Analysis Model (MSSA) Semantically Significant Invariant Visual Glossary (SSIVG) Experiments Image retrieval Image classification Object Recognition Conclusion and perspectives 2
  • 3. Introduction Related works Our approach Experiments Conclusion and perspectives Motivation Digital content grows rapidly Personal acquisition devices Broadcast TV Surveillance Relatively easy to store, but useless if no automatic processing, classification, and retrieving The usual way to solve this problem is by describing images by keywords. This method suffers from subjectivity, text ambiguity and the lack of automatic annotation 3
  • 4. Introduction Related works Our approach Experiments Conclusion and perspectives Visual representations Image-based representations are based on global visual features extracted over the whole image like color, color moment, shape or texture Visual representations Image-based representations Part-based representations 4
  • 5. Introduction Related works Our approach Experiments Conclusion and perspectives Visual representations The main drawbacks of Image-based representations: High sensitivity to : Scale Pose Lighting condition changes Occlusions Cannot capture the local information of an image Part-based representations: Based on the statistics of features extracted from segmented image regions 5
  • 6. Introduction Related works Our approach Experiments Conclusion and perspectives Visual representationsPart-based representations (Bag of visual words) Feature space Visual word vocabulary VW1 VW2 VW3 VW4 . . . Compute local descriptors Feature clustering VW1 VW2 VW3 VW1 VW2 VW3 VW4 . . . 2 1 1 1 . . . VW4 Frequency VW1 6
  • 7.
  • 8. Record number of occurrences
  • 9. Ignore the positionUsing only keypoints-based Intensity descriptors: Neither shape nor color information is used Feature quantization noisiness: Unnecessary and insignificant visual words are generated 7
  • 10.
  • 11. Introduction Related works Our approach Experiments Conclusion and perspectives Objectives Enhanced BOW representation Different local information (intensity, color, shape…) Spatial constitution of the image Efficient visual word vocabulary structure Higher-level visual representation Less noisy More discriminative More invariant to the visual diversity 9
  • 12. MSSA model Introduction Related works Our approach Experiments Conclusion and perspectives E-BOW Overview of the proposed higher-level visual representation SSVIWs & SSIVPs generation E-BOW representation SSIVG representation SSIVG Learning the MSSA model Visual word vocabulary building Set of images 10
  • 13. Introduction Related works Our approach Experiments Conclusion and perspectives Introduction Related works Spatial Pyramid Matching Kernel (SPM) & sparse coding Visual phrase & descriptive visual phrase Visual phrase pattern & visual synset Our approach Experiments Conclusion and perspectives 11
  • 14. Introduction Related works Our approach Experiments Conclusion and perspectives Spatial Pyramid Matching Kernel (SPM) & sparse coding Lazebnik et al. [CVPR06] Spatial Pyramid Matching Kernel (SPM): exploiting the spatial information of location regions. Yang et al. [CVPR09] SPM + sparse coding: replacing k-means in the SPM 12
  • 15. Introduction Related works Our approach Experiments Conclusion and perspectives Visual phrase & descriptive visual phrase Zheng and Gao [TOMCCAP08] Visual phrase: pair of spatially adjacent local image patches Zhang et al. [ACM MM09] Descriptive visual phrase: selected according to the frequencies of its constituent visual word pairs 13
  • 16. Introduction Related works Our approach Experiments Conclusion and perspectives Visual phrase pattern & visual sysnet Yuan et al. [CVPR07] Visual phrase pattern: spatially co-occurring group of visual words Zheng et al. [CVPR08] Visual synset: relevance-consistent group of visual words or phrases in the spirit of the text synset 14
  • 17. Introduction Related works Our approach Experiments Conclusion and perspectives Comparison of the different enhancements of the BOW 15
  • 18. Introduction Related works Our approach Experiments Conclusion and perspectives Introduction Related works Our approach Enhanced Bag of Visual Words (E-BOW) Multilayer Semantically Significant Analysis Model (MSSA) Semantically Significant Invariant Visual Glossary (SSIVG) Experiments Conclusion and perspectives 16
  • 19. Introduction Related works Our approach Experiments Conclusion and perspectives Enhanced Bag of Visual Words (E-BOW) Set of images E-BOW MSSA model SSIVG SURF & Edge Context extraction Features fusion Hierarchal features quantization E-BOW representation 17
  • 20. Introduction Related works Our approach Experiments Conclusion and perspectives Enhanced Bag of Visual Words (E-BOW)Feature extraction Interest points detection Edge points detection Colorfiltering using vector median filter (VMF )  SURF feature vector extraction at each interest point Colorfeature extraction at each interest and edge point Fusion of the SURF and edgecontextfeaturevectors Color and position vector clustering using Gaussian mixture model Edge Context feature vector extraction at each interest point Collection of all vectors for the whole image set ∑3 µ3Pi3 ∑2 µ2Pi2 ∑1 µ1Pi1 HAC and Divisive Hierarchical K-Means clustering VW vocabulary 18
  • 21. Introduction Related works Our approach Experiments Conclusion and perspectives Enhanced Bag of Visual Words (E-BOW)Feature extraction (SURF) SURF is a low-level feature descriptor Describes how the pixel intensities are distributed within a scale dependent neighborhood of each interest point. Good at Handling serious blurring Handling image rotation Poor at Handling illumination change Efficient 19
  • 22. Introduction Related works Our approach Experiments Conclusion and perspectives Enhanced Bag of Visual Words (E-BOW)Feature extraction (Edge Context descriptor) Edge context descriptor is represented at each interest point as a histogram : 6 bins for the magnitude of the drawn vectors to the edge points 4 bins for the orientation angle 20
  • 23. Introduction Related works Our approach Experiments Conclusion and perspectives Enhanced Bag of Visual Words (E-BOW)Feature extraction (Edge context descriptor) This descriptor is invariant to : Translation : The distribution of the edge points is measured with respect to fixed points Scale: The radial distance is normalized by a mean distance between the whole set of points within the same Gaussian Rotation: All angles are measured relative to the tangent angle of each interest point 21
  • 24.
  • 25. Introduction Related works Our approach Experiments Conclusion and perspectives Multilayer Semantically Significant Analysis (MSSA) model Set of images E-BOW MSSA model SSIVG VWs semantic inference estimation SURF & Edge Context extraction Number of latent topics Estimation Features fusion Parameters estimation Hierarchal features quantization Generative process E-BOW representation 23
  • 26. Introduction Related works Our approach Experiments Conclusion and perspectives Multilayer Semantically Significant Analysis (MSSA) model Generative Process Different Visualaspects A topic model that considers this hierarchal structure is needed Higher-level aspect: People 24
  • 27.
  • 28. Visual latent topic that represents the visual aspectsV W v h im M N 25
  • 29.
  • 30.
  • 31. This leads to the following multiplicative update rules : 27
  • 32.
  • 33. Number of the high latent topics (L)
  • 34. Number of the visual latent topics (K)
  • 36. is the number of free parameters:28
  • 37. Introduction Related works Our approach Experiments Conclusion and perspectives Semantically Significant Invariant Visual Glossary (SSIVG) representation Set of images E-BOW MSSA model SSIVG VWs semantic inference estimation SURF & Edge Context extraction SSVP representation SSIVG representation Number of latent topics Estimation Features fusion SSVPs generation SSIVP representation Parameters estimation Hierarchal features quantization SSVW representation Divisive theoretic clustering SSIVW representation Generative process E-BOW representation SSVWs selection 29
  • 38. Introduction Related works Our approach Experiments Conclusion and perspectives Semantically Significant Invariant Visual Glossary (SSIVG) representationSemantically Significant Visual Word (SSVW) Set of relevant visual topics Set of VWs Estimating using MSSA Set of SSVWs Estimating using MSSA 30
  • 39.
  • 40. Involved in strong association rules
  • 41. High support and confidence
  • 42. Have the same semantic meaning
  • 43. High probability related to at least one common visual latent topic31
  • 44. Introduction Related works Our approach Experiments Conclusion and perspectives Semantically Significant Invariant Visual Glossary (SSIVG) representationSemantically Significant Visual Phrase (SSVP) SSIVP126 SSIVP126 SSIVP326 SSIVP326 SSIVP304 SSIVP304 32
  • 45. Introduction Related works Our approach Experiments Conclusion and perspectives Semantically Significant Invariant Visual Glossary (SSIVG) representationInvariance Problem Studying the co-occurrence and spatial scatter information make the image representation more discriminative The invariance power of SSVWs and SSVPs is still low Text documents Synonymous words can be clustered into one synonymy set to improve the document categorization performance 33
  • 46. Introduction Related works Our approach Experiments Conclusion and perspectives Semantically Significant Invariant Visual Glossary (SSIVG) representation SSIVG : higher-level visual representation composed from two different layers of representation Semantically Significant Invariant Visual Word (SSIVW) Re-indexed SSVWs after a distributional clustering Semantically Significant Invariant Visual Phrases (SSIVP) Re-indexed SSVPs after a distributional clustering Set of relevant visual topics Set of SSVWs and SSVPs Estimating using MSSA Estimating using MSSA Divisive theoretic clustering Set of SSIVGs Set of SSIVPs Set of SSIVWs 34
  • 47. Introduction Related works Our approach Experiments Conclusion and perspectives Experiments Introduction Related works Our approach Experiments Image retrieval Image classification Object Recognition Conclusion and perspectives 35
  • 48.
  • 49. The traditional Vector Space Model of Information Retrieval is adapted
  • 50. The weighting for the SSIVP
  • 52. The inverted file structure36
  • 53. Introduction Related works Our approach Experiments Conclusion and perspectives Assessment of the SSIVG representation Performance in image retrieval 37
  • 54. Introduction Related works Our approach Experiments Conclusion and perspectives Assessment of the SSIVG representation performance in image retrieval 38 38
  • 55.
  • 57. SVM with linear kernelMulticlass Vote-Based Classifier (MVBC) 39
  • 58. Introduction Related works Our approach Experiments Conclusion and perspectives Evaluation of the SSIVG representation in image classification Multiclass Vote-Based Classifier (MVBC) For each , we detect the high latent topic that maximizes: is The final voting score for a high latent topic : is Each image is categorized according to the dominant high latent 40
  • 59. Introduction Related works Our approach Experiments Conclusion and perspectives Evaluation of the SSIVG representation performance in classification 41
  • 60.
  • 61. Introduction Related works Our approach Experiments Conclusion and perspectives Assessment of the SSIVG Representation Performance in Object Recognition 43
  • 62. Introduction Related works Our approach Experiments Conclusion and perspectives Experiments Introduction Related works Our approach Experiments Conclusion and perspectives 44
  • 63.
  • 64. New local feature descriptor (Edge Context)
  • 65. Efficient visual word vocabulary structure
  • 66. New Multilayer Semantic Significance (MSSA) model
  • 67. Semantic inferences of different layers of representation
  • 70. More invariant to visual diversity
  • 72. Outperform other sate of the art works45
  • 73.
  • 74. On-line algorithms to continuously (re-)learn the parameters
  • 76. Context large-scale databases where large intra-class variations can occur
  • 78. Cross-modal data (visual and textual closed captions contents)
  • 79. New generic framework of video summarization
  • 80. Study the semantic coherence between visual contents and textual captions46
  • 81. Thank you for your attention ! ismail.elsayad@lifl.fr Questions ?
  • 82. Introduction Related works Our approach Experiments Conclusion and perspectives Parameter Settings 48

Editor's Notes

  1. Talk briefly about the introductionTalk about the different parts of the approach briefly
  2. Nowadays, images can be described using their visual content.
  3. Talk about analogy between text and imagesDifferent visual appearanceSAME VISUAL WORD APPEAR IN TWO DIFFERENT IMAGE DESCRIBING DIFFERENT SEMANTICS
  4. This work aims at addressing these drawbacks with the following
  5. All the related works are based on BOW representation, they propose different Higher-level representation
  6. Spatial pyramid extended the BOW representationExample of spatial pyramid for three different spatial level and resolutions
  7. enhanced this approach by selecting descriptive visual phrases from the constructed visual phrases according to the frequencies of their constituent visual word pairs.
  8. EFFICIENT structure by other approaches are for visual word vocabulary but we used for word and phrase
  9. Motivation of the edge context
  10. Add training images
  11. This generative process leads to the following conditional probability distribution:Following the maximum likelihood principle, one can estimate the parameters bymaximizing the log-likelihood function as follows:
  12. The number of the high latent topics, L, and the number of the visual latent topics, K, is determined in advance for the model fitting based on the Minimum Description Length (MDL)principle
  13. Check the size of the cylinders
  14. See right part
  15. Correct the animationMake boxes bigger
  16. Global approach slide before this slide
  17. Add parameter settings
  18. Discriptive correct and upper casesAdd references
  19. Add this slide at the end of the presentations and add sub points to slide before
  20. Upper case corrections
  21. Global approach slide before this slide
  22. Add parameter settings
  23. Parameters update: It will be essential to design on-line algorithms to continuously (re-)learn the parameters of the proposed MSSA model, as the content of digital databases is modified by the regular upload or deletion of images.Invariance issue: It will be interesting to investigate more on the invariance issue especially in the context large-scale databases where large intra-class variations can occur.Cross-modalitily extension: The proposed higher-level visual representation can be extended to video content. The extension can be based on cross-modal data (visual and textual closed captions contents). Video summarization: A new generic framework of video summarization based on the extended higher-level semantic representation of video content can be designed.Talk that this work is applied at the frame level
  24. Parameters update: It will be essential to design on-line algorithms to continuously (re-)learn the parameters of the proposed MSSA model, as the content of digital databases is modified by the regular upload or deletion of images.Invariance issue: It will be interesting to investigate more on the invariance issue especially in the context large-scale databases where large intra-class variations can occur.Cross-modalitily extension: The proposed higher-level visual representation can be extended to video content. The extension can be based on cross-modal data (visual and textual closed captions contents). Video summarization: A new generic framework of video summarization based on the extended higher-level semantic representation of video content can be designed.Talk that this work is applied at the frame level